Search Results: "ardo"

16 November 2022

Antoine Beaupr : A ZFS migration

In my tubman setup, I started using ZFS on an old server I had lying around. The machine is really old though (2011!) and it "feels" pretty slow. I want to see how much of that is ZFS and how much is the machine. Synthetic benchmarks show that ZFS may be slower than mdadm in RAID-10 or RAID-6 configuration, so I want to confirm that on a live workload: my workstation. Plus, I want easy, regular, high performance backups (with send/receive snapshots) and there's no way I'm going to use BTRFS because I find it too confusing and unreliable. So off we go.

Installation Since this is a conversion (and not a new install), our procedure is slightly different than the official documentation but otherwise it's pretty much in the same spirit: we're going to use ZFS for everything, including the root filesystem. So, install the required packages, on the current system:
apt install --yes gdisk zfs-dkms zfs zfs-initramfs zfsutils-linux
We also tell DKMS that we need to rebuild the initrd when upgrading:
echo REMAKE_INITRD=yes > /etc/dkms/zfs.conf

Partitioning This is going to partition /dev/sdc with:
  • 1MB MBR / BIOS legacy boot
  • 512MB EFI boot
  • 1GB bpool, unencrypted pool for /boot
  • rest of the disk for zpool, the rest of the data
     sgdisk --zap-all /dev/sdc
     sgdisk -a1 -n1:24K:+1000K -t1:EF02 /dev/sdc
     sgdisk     -n2:1M:+512M   -t2:EF00 /dev/sdc
     sgdisk     -n3:0:+1G      -t3:BF01 /dev/sdc
     sgdisk     -n4:0:0        -t4:BF00 /dev/sdc
    
That will look something like this:
    root@curie:/home/anarcat# sgdisk -p /dev/sdc
    Disk /dev/sdc: 1953525168 sectors, 931.5 GiB
    Model: ESD-S1C         
    Sector size (logical/physical): 512/512 bytes
    Disk identifier (GUID): [REDACTED]
    Partition table holds up to 128 entries
    Main partition table begins at sector 2 and ends at sector 33
    First usable sector is 34, last usable sector is 1953525134
    Partitions will be aligned on 16-sector boundaries
    Total free space is 14 sectors (7.0 KiB)
    Number  Start (sector)    End (sector)  Size       Code  Name
       1              48            2047   1000.0 KiB  EF02  
       2            2048         1050623   512.0 MiB   EF00  
       3         1050624         3147775   1024.0 MiB  BF01  
       4         3147776      1953525134   930.0 GiB   BF00
Unfortunately, we can't be sure of the sector size here, because the USB controller is probably lying to us about it. Normally, this smartctl command should tell us the sector size as well:
root@curie:~# smartctl -i /dev/sdb -qnoserial
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-14-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Black Mobile
Device Model:     WDC WD10JPLX-00MBPT0
Firmware Version: 01.01H01
User Capacity:    1 000 204 886 016 bytes [1,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue May 17 13:33:04 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Above is the example of the builtin HDD drive. But the SSD device enclosed in that USB controller doesn't support SMART commands, so we can't trust that it really has 512 bytes sectors. This matters because we need to tweak the ashift value correctly. We're going to go ahead the SSD drive has the common 4KB settings, which means ashift=12. Note here that we are not creating a separate partition for swap. Swap on ZFS volumes (AKA "swap on ZVOL") can trigger lockups and that issue is still not fixed upstream. Ubuntu recommends using a separate partition for swap instead. But since this is "just" a workstation, we're betting that we will not suffer from this problem, after hearing a report from another Debian developer running this setup on their workstation successfully. We do not recommend this setup though. In fact, if I were to redo this partition scheme, I would probably use LUKS encryption and setup a dedicated swap partition, as I had problems with ZFS encryption as well.

Creating pools ZFS pools are somewhat like "volume groups" if you are familiar with LVM, except they obviously also do things like RAID-10. (Even though LVM can technically also do RAID, people typically use mdadm instead.) In any case, the guide suggests creating two different pools here: one, in cleartext, for boot, and a separate, encrypted one, for the rest. Technically, the boot partition is required because the Grub bootloader only supports readonly ZFS pools, from what I understand. But I'm a little out of my depth here and just following the guide.

Boot pool creation This creates the boot pool in readonly mode with features that grub supports:
    zpool create \
        -o cachefile=/etc/zfs/zpool.cache \
        -o ashift=12 -d \
        -o feature@async_destroy=enabled \
        -o feature@bookmarks=enabled \
        -o feature@embedded_data=enabled \
        -o feature@empty_bpobj=enabled \
        -o feature@enabled_txg=enabled \
        -o feature@extensible_dataset=enabled \
        -o feature@filesystem_limits=enabled \
        -o feature@hole_birth=enabled \
        -o feature@large_blocks=enabled \
        -o feature@lz4_compress=enabled \
        -o feature@spacemap_histogram=enabled \
        -o feature@zpool_checkpoint=enabled \
        -O acltype=posixacl -O canmount=off \
        -O compression=lz4 \
        -O devices=off -O normalization=formD -O relatime=on -O xattr=sa \
        -O mountpoint=/boot -R /mnt \
        bpool /dev/sdc3
I haven't investigated all those settings and just trust the upstream guide on the above.

Main pool creation This is a more typical pool creation.
    zpool create \
        -o ashift=12 \
        -O encryption=on -O keylocation=prompt -O keyformat=passphrase \
        -O acltype=posixacl -O xattr=sa -O dnodesize=auto \
        -O compression=zstd \
        -O relatime=on \
        -O canmount=off \
        -O mountpoint=/ -R /mnt \
        rpool /dev/sdc4
Breaking this down:
  • -o ashift=12: mentioned above, 4k sector size
  • -O encryption=on -O keylocation=prompt -O keyformat=passphrase: encryption, prompt for a password, default algorithm is aes-256-gcm, explicit in the guide, made implicit here
  • -O acltype=posixacl -O xattr=sa: enable ACLs, with better performance (not enabled by default)
  • -O dnodesize=auto: related to extended attributes, less compatibility with other implementations
  • -O compression=zstd: enable zstd compression, can be disabled/enabled by dataset to with zfs set compression=off rpool/example
  • -O relatime=on: classic atime optimisation, another that could be used on a busy server is atime=off
  • -O canmount=off: do not make the pool mount automatically with mount -a?
  • -O mountpoint=/ -R /mnt: mount pool on / in the future, but /mnt for now
Those settings are all available in zfsprops(8). Other flags are defined in zpool-create(8). The reasoning behind them is also explained in the upstream guide and some also in [the Debian wiki][]. Those flags were actually not used:
  • -O normalization=formD: normalize file names on comparisons (not storage), implies utf8only=on, which is a bad idea (and effectively meant my first sync failed to copy some files, including this folder from a supysonic checkout). and this cannot be changed after the filesystem is created. bad, bad, bad.
[the Debian wiki]: https://wiki.debian.org/ZFS#Advanced_Topics

Side note about single-disk pools Also note that we're living dangerously here: single-disk ZFS pools are rumoured to be more dangerous than not running ZFS at all. The choice quote from this article is:
[...] any error can be detected, but cannot be corrected. This sounds like an acceptable compromise, but its actually not. The reason its not is that ZFS' metadata cannot be allowed to be corrupted. If it is it is likely the zpool will be impossible to mount (and will probably crash the system once the corruption is found). So a couple of bad sectors in the right place will mean that all data on the zpool will be lost. Not some, all. Also there's no ZFS recovery tools, so you cannot recover any data on the drives.
Compared with (say) ext4, where a single disk error can recovered, this is pretty bad. But we are ready to live with this with the idea that we'll have hourly offline snapshots that we can easily recover from. It's trade-off. Also, we're running this on a NVMe/M.2 drive which typically just blinks out of existence completely, and doesn't "bit rot" the way a HDD would. Also, the FreeBSD handbook quick start doesn't have any warnings about their first example, which is with a single disk. So I am reassured at least.

Creating mount points Next we create the actual filesystems, known as "datasets" which are the things that get mounted on mountpoint and hold the actual files.
  • this creates two containers, for ROOT and BOOT
     zfs create -o canmount=off -o mountpoint=none rpool/ROOT &&
     zfs create -o canmount=off -o mountpoint=none bpool/BOOT
    
    Note that it's unclear to me why those datasets are necessary, but they seem common practice, also used in this FreeBSD example. The OpenZFS guide mentions the Solaris upgrades and Ubuntu's zsys that use that container for upgrades and rollbacks. This blog post seems to explain a bit the layout behind the installer.
  • this creates the actual boot and root filesystems:
     zfs create -o canmount=noauto -o mountpoint=/ rpool/ROOT/debian &&
     zfs mount rpool/ROOT/debian &&
     zfs create -o mountpoint=/boot bpool/BOOT/debian
    
    I guess the debian name here is because we could technically have multiple operating systems with the same underlying datasets.
  • then the main datasets:
     zfs create                                 rpool/home &&
     zfs create -o mountpoint=/root             rpool/home/root &&
     chmod 700 /mnt/root &&
     zfs create                                 rpool/var
    
  • exclude temporary files from snapshots:
     zfs create -o com.sun:auto-snapshot=false  rpool/var/cache &&
     zfs create -o com.sun:auto-snapshot=false  rpool/var/tmp &&
     chmod 1777 /mnt/var/tmp
    
  • and skip automatic snapshots in Docker:
     zfs create -o canmount=off                 rpool/var/lib &&
     zfs create -o com.sun:auto-snapshot=false  rpool/var/lib/docker
    
    Notice here a peculiarity: we must create rpool/var/lib to create rpool/var/lib/docker otherwise we get this error:
     cannot create 'rpool/var/lib/docker': parent does not exist
    
    ... and no, just creating /mnt/var/lib doesn't fix that problem. In fact, it makes things even more confusing because an existing directory shadows a mountpoint, which is the opposite of how things normally work. Also note that you will probably need to change storage driver in Docker, see the zfs-driver documentation for details but, basically, I did:
    echo '  "storage-driver": "zfs"  ' > /etc/docker/daemon.json
    
    Note that podman has the same problem (and similar solution):
    printf '[storage]\ndriver = "zfs"\n' > /etc/containers/storage.conf
    
  • make a tmpfs for /run:
     mkdir /mnt/run &&
     mount -t tmpfs tmpfs /mnt/run &&
     mkdir /mnt/run/lock
    
We don't create a /srv, as that's the HDD stuff. Also mount the EFI partition:
mkfs.fat -F 32 /dev/sdc2 &&
mount /dev/sdc2 /mnt/boot/efi/
At this point, everything should be mounted in /mnt. It should look like this:
root@curie:~# LANG=C df -h -t zfs -t vfat
Filesystem            Size  Used Avail Use% Mounted on
rpool/ROOT/debian     899G  384K  899G   1% /mnt
bpool/BOOT/debian     832M  123M  709M  15% /mnt/boot
rpool/home            899G  256K  899G   1% /mnt/home
rpool/home/root       899G  256K  899G   1% /mnt/root
rpool/var             899G  384K  899G   1% /mnt/var
rpool/var/cache       899G  256K  899G   1% /mnt/var/cache
rpool/var/tmp         899G  256K  899G   1% /mnt/var/tmp
rpool/var/lib/docker  899G  256K  899G   1% /mnt/var/lib/docker
/dev/sdc2             511M  4.0K  511M   1% /mnt/boot/efi
Now that we have everything setup and mounted, let's copy all files over.

Copying files This is a list of all the mounted filesystems
for fs in /boot/ /boot/efi/ / /home/; do
    echo "syncing $fs to /mnt$fs..." && 
    rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done
You can check that the list is correct with:
mount -l -t ext4,btrfs,vfat   awk ' print $3 '
Note that we skip /srv as it's on a different disk. On the first run, we had:
root@curie:~# for fs in /boot/ /boot/efi/ / /home/; do
        echo "syncing $fs to /mnt$fs..." && 
        rsync -aSHAXx --info=progress2 $fs /mnt$fs
    done
syncing /boot/ to /mnt/boot/...
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/299)  
syncing /boot/efi/ to /mnt/boot/efi/...
     16,831,437 100%  184.14MB/s    0:00:00 (xfr#101, to-chk=0/110)
syncing / to /mnt/...
 28,019,293,280  94%   47.63MB/s    0:09:21 (xfr#703710, ir-chk=6748/839220)rsync: [generator] delete_file: rmdir(var/lib/docker) failed: Device or resource busy (16)
could not make way for new symlink: var/lib/docker
 34,081,267,990  98%   50.71MB/s    0:10:40 (xfr#736577, to-chk=0/867732)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
syncing /home/ to /mnt/home/...
rsync: [sender] readlink_stat("/home/anarcat/.fuse") failed: Permission denied (13)
 24,456,268,098  98%   68.03MB/s    0:05:42 (xfr#159867, ir-chk=6875/172377) 
file has vanished: "/home/anarcat/.cache/mozilla/firefox/s2hwvqbu.quantum/cache2/entries/B3AB0CDA9C4454B3C1197E5A22669DF8EE849D90"
199,762,528,125  93%   74.82MB/s    0:42:26 (xfr#1437846, ir-chk=1018/1983979)rsync: [generator] recv_generator: mkdir "/mnt/home/anarcat/dist/supysonic/tests/assets/\#346" failed: Invalid or incomplete multibyte or wide character (84)
*** Skipping any contents from this failed directory ***
315,384,723,978  96%   76.82MB/s    1:05:15 (xfr#2256473, to-chk=0/2993950)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
Note the failure to transfer that supysonic file? It turns out they had a weird filename in their source tree, since then removed, but still it showed how the utf8only feature might not be such a bad idea. At this point, the procedure was restarted all the way back to "Creating pools", after unmounting all ZFS filesystems (umount /mnt/run /mnt/boot/efi && umount -t zfs -a) and destroying the pool, which, surprisingly, doesn't require any confirmation (zpool destroy rpool). The second run was cleaner:
root@curie:~# for fs in /boot/ /boot/efi/ / /home/; do
        echo "syncing $fs to /mnt$fs..." && 
        rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
    done
syncing /boot/ to /mnt/boot/...
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/299)  
syncing /boot/efi/ to /mnt/boot/efi/...
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/110)  
syncing / to /mnt/...
 28,019,033,070  97%   42.03MB/s    0:10:35 (xfr#703671, ir-chk=1093/833515)rsync: [generator] delete_file: rmdir(var/lib/docker) failed: Device or resource busy (16)
could not make way for new symlink: var/lib/docker
 34,081,807,102  98%   44.84MB/s    0:12:04 (xfr#736580, to-chk=0/867723)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
syncing /home/ to /mnt/home/...
rsync: [sender] readlink_stat("/home/anarcat/.fuse") failed: Permission denied (13)
IO error encountered -- skipping file deletion
 24,043,086,450  96%   62.03MB/s    0:06:09 (xfr#151819, ir-chk=15117/172571)
file has vanished: "/home/anarcat/.cache/mozilla/firefox/s2hwvqbu.quantum/cache2/entries/4C1FDBFEA976FF924D062FB990B24B897A77B84B"
315,423,626,507  96%   67.09MB/s    1:14:43 (xfr#2256845, to-chk=0/2994364)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
Also note the transfer speed: we seem capped at 76MB/s, or 608Mbit/s. This is not as fast as I was expecting: the USB connection seems to be at around 5Gbps:
anarcat@curie:~$ lsusb -tv   head -4
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
    ID 1d6b:0003 Linux Foundation 3.0 root hub
     __ Port 1: Dev 4, If 0, Class=Mass Storage, Driver=uas, 5000M
        ID 0b05:1932 ASUSTek Computer, Inc.
So it shouldn't cap at that speed. It's possible the USB adapter is failing to give me the full speed though. It's not the M.2 SSD drive either, as that has a ~500MB/s bandwidth, acccording to its spec. At this point, we're about ready to do the final configuration. We drop to single user mode and do the rest of the procedure. That used to be shutdown now, but it seems like the systemd switch broke that, so now you can reboot into grub and pick the "recovery" option. Alternatively, you might try systemctl rescue, as I found out. I also wanted to copy the drive over to another new NVMe drive, but that failed: it looks like the USB controller I have doesn't work with older, non-NVME drives.

Boot configuration Now we need to enter the new system to rebuild the boot loader and initrd and so on. First, we bind mounts and chroot into the ZFS disk:
mount --rbind /dev  /mnt/dev &&
mount --rbind /proc /mnt/proc &&
mount --rbind /sys  /mnt/sys &&
chroot /mnt /bin/bash
Next we add an extra service that imports the bpool on boot, to make sure it survives a zpool.cache destruction:
cat > /etc/systemd/system/zfs-import-bpool.service <<EOF
[Unit]
DefaultDependencies=no
Before=zfs-import-scan.service
Before=zfs-import-cache.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -N -o cachefile=none bpool
# Work-around to preserve zpool cache:
ExecStartPre=-/bin/mv /etc/zfs/zpool.cache /etc/zfs/preboot_zpool.cache
ExecStartPost=-/bin/mv /etc/zfs/preboot_zpool.cache /etc/zfs/zpool.cache
[Install]
WantedBy=zfs-import.target
EOF
Enable the service:
systemctl enable zfs-import-bpool.service
I had to trim down /etc/fstab and /etc/crypttab to only contain references to the legacy filesystems (/srv is still BTRFS!). If we don't already have a tmpfs defined in /etc/fstab:
ln -s /usr/share/systemd/tmp.mount /etc/systemd/system/ &&
systemctl enable tmp.mount
Rebuild boot loader with support for ZFS, but also to workaround GRUB's missing zpool-features support:
grub-probe /boot   grep -q zfs &&
update-initramfs -c -k all &&
sed -i 's,GRUB_CMDLINE_LINUX.*,GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/debian",' /etc/default/grub &&
update-grub
For good measure, make sure the right disk is configured here, for example you might want to tag both drives in a RAID array:
dpkg-reconfigure grub-pc
Install grub to EFI while you're there:
grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=debian --recheck --no-floppy
Filesystem mount ordering. The rationale here in the OpenZFS guide is a little strange, but I don't dare ignore that.
mkdir /etc/zfs/zfs-list.cache
touch /etc/zfs/zfs-list.cache/bpool
touch /etc/zfs/zfs-list.cache/rpool
zed -F &
Verify that zed updated the cache by making sure these are not empty:
cat /etc/zfs/zfs-list.cache/bpool
cat /etc/zfs/zfs-list.cache/rpool
Once the files have data, stop zed:
fg
Press Ctrl-C.
Fix the paths to eliminate /mnt:
sed -Ei "s /mnt/? / " /etc/zfs/zfs-list.cache/*
Snapshot initial install:
zfs snapshot bpool/BOOT/debian@install
zfs snapshot rpool/ROOT/debian@install
Exit chroot:
exit

Finalizing One last sync was done in rescue mode:
for fs in /boot/ /boot/efi/ / /home/; do
    echo "syncing $fs to /mnt$fs..." && 
    rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done
Then we unmount all filesystems:
mount   grep -v zfs   tac   awk '/\/mnt/  print $3 '   xargs -i  umount -lf  
zpool export -a
Reboot, swap the drives, and boot in ZFS. Hurray!

Benchmarks This is a test that was ran in single-user mode using fio and the Ars Technica recommended tests, which are:
  • Single 4KiB random write process:
     fio --name=randwrite4k1x --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
    
  • 16 parallel 64KiB random write processes:
     fio --name=randwrite64k16x --ioengine=posixaio --rw=randwrite --bs=64k --size=256m --numjobs=16 --iodepth=16 --runtime=60 --time_based --end_fsync=1
    
  • Single 1MiB random write process:
     fio --name=randwrite1m1x --ioengine=posixaio --rw=randwrite --bs=1m --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
    
Strangely, that's not exactly what the author, Jim Salter, did in his actual test bench used in the ZFS benchmarking article. The first thing is there's no read test at all, which is already pretty strange. But also it doesn't include stuff like dropping caches or repeating results. So here's my variation, which i called fio-ars-bench.sh for now. It just batches a bunch of fio tests, one by one, 60 seconds each. It should take about 12 minutes to run, as there are 3 pair of tests, read/write, with and without async. My bias, before building, running and analysing those results is that ZFS should outperform the traditional stack on writes, but possibly not on reads. It's also possible it outperforms it on both, because it's a newer drive. A new test might be possible with a new external USB drive as well, although I doubt I will find the time to do this.

Results All tests were done on WD blue SN550 drives, which claims to be able to push 2400MB/s read and 1750MB/s write. An extra drive was bought to move the LVM setup from a WDC WDS500G1B0B-00AS40 SSD, a WD blue M.2 2280 SSD that was at least 5 years old, spec'd at 560MB/s read, 530MB/s write. Benchmarks were done on the M.2 SSD drive but discarded so that the drive difference is not a factor in the test. In practice, I'm going to assume we'll never reach those numbers because we're not actually NVMe (this is an old workstation!) so the bottleneck isn't the disk itself. For our purposes, it might still give us useful results.

Rescue test, LUKS/LVM/ext4 Those tests were performed with everything shutdown, after either entering the system in rescue mode, or by reaching that target with:
systemctl rescue
The network might have been started before or after the test as well:
systemctl start systemd-networkd
So it should be fairly reliable as basically nothing else is running. Raw numbers, from the ?job-curie-lvm.log, converted to MiB/s and manually merged:
test read I/O read IOPS write I/O write IOPS
rand4k4g1x 39.27 10052 212.15 54310
rand4k4g1x--fsync=1 39.29 10057 2.73 699
rand64k256m16x 1297.00 20751 1068.57 17097
rand64k256m16x--fsync=1 1290.90 20654 353.82 5661
rand1m16g1x 315.15 315 563.77 563
rand1m16g1x--fsync=1 345.88 345 157.01 157
Peaks are at about 20k IOPS and ~1.3GiB/s read, 1GiB/s write in the 64KB blocks with 16 jobs. Slowest is the random 4k block sync write at an abysmal 3MB/s and 700 IOPS The 1MB read/write tests have lower IOPS, but that is expected.

Rescue test, ZFS This test was also performed in rescue mode. Raw numbers, from the ?job-curie-zfs.log, converted to MiB/s and manually merged:
test read I/O read IOPS write I/O write IOPS
rand4k4g1x 77.20 19763 27.13 6944
rand4k4g1x--fsync=1 76.16 19495 6.53 1673
rand64k256m16x 1882.40 30118 70.58 1129
rand64k256m16x--fsync=1 1865.13 29842 71.98 1151
rand1m16g1x 921.62 921 102.21 102
rand1m16g1x--fsync=1 908.37 908 64.30 64
Peaks are at 1.8GiB/s read, also in the 64k job like above, but much faster. The write is, as expected, much slower at 70MiB/s (compared to 1GiB/s!), but it should be noted the sync write doesn't degrade performance compared to async writes (although it's still below the LVM 300MB/s).

Conclusions Really, ZFS has trouble performing in all write conditions. The random 4k sync write test is the only place where ZFS outperforms LVM in writes, and barely (7MiB/s vs 3MiB/s). Everywhere else, writes are much slower, sometimes by an order of magnitude. And before some ZFS zealot jumps in talking about the SLOG or some other cache that could be added to improved performance, I'll remind you that those numbers are on a bare bones NVMe drive, pretty much as fast storage as you can find on this machine. Adding another NVMe drive as a cache probably will not improve write performance here. Still, those are very different results than the tests performed by Salter which shows ZFS beating traditional configurations in all categories but uncached 4k reads (not writes!). That said, those tests are very different from the tests I performed here, where I test writes on a single disk, not a RAID array, which might explain the discrepancy. Also, note that neither LVM or ZFS manage to reach the 2400MB/s read and 1750MB/s write performance specification. ZFS does manage to reach 82% of the read performance (1973MB/s) and LVM 64% of the write performance (1120MB/s). LVM hits 57% of the read performance and ZFS hits barely 6% of the write performance. Overall, I'm a bit disappointed in the ZFS write performance here, I must say. Maybe I need to tweak the record size or some other ZFS voodoo, but I'll note that I didn't have to do any such configuration on the other side to kick ZFS in the pants...

Real world experience This section document not synthetic backups, but actual real world workloads, comparing before and after I switched my workstation to ZFS.

Docker performance I had the feeling that running some git hook (which was firing a Docker container) was "slower" somehow. It seems that, at runtime, ZFS backends are significant slower than their overlayfs/ext4 equivalent:
May 16 14:42:52 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87\x2dinit-merged.mount: Succeeded.
May 16 14:42:52 curie systemd[5161]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87\x2dinit-merged.mount: Succeeded.
May 16 14:42:52 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:53 curie dockerd[1723]: time="2022-05-16T14:42:53.087219426-04:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10 pid=151170
May 16 14:42:53 curie systemd[1]: Started libcontainer container af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10.
May 16 14:42:54 curie systemd[1]: docker-af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10.scope: Succeeded.
May 16 14:42:54 curie dockerd[1723]: time="2022-05-16T14:42:54.047297800-04:00" level=info msg="shim disconnected" id=af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10
May 16 14:42:54 curie dockerd[998]: time="2022-05-16T14:42:54.051365015-04:00" level=info msg="ignoring event" container=af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
May 16 14:42:54 curie systemd[2444]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[5161]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[2444]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:54 curie systemd[5161]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:54 curie systemd[1]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
Translating this:
  • container setup: ~1 second
  • container runtime: ~1 second
  • container teardown: ~1 second
  • total runtime: 2-3 seconds
Obviously, those timestamps are not quite accurate enough to make precise measurements... After I switched to ZFS:
mai 30 15:31:39 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf\x2dinit.mount: Succeeded. 
mai 30 15:31:39 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf\x2dinit.mount: Succeeded. 
mai 30 15:31:40 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded. 
mai 30 15:31:40 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded. 
mai 30 15:31:41 curie dockerd[3199]: time="2022-05-30T15:31:41.551403693-04:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 pid=141080 
mai 30 15:31:41 curie systemd[1]: run-docker-runtime\x2drunc-moby-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142-runc.ZVcjvl.mount: Succeeded. 
mai 30 15:31:41 curie systemd[5287]: run-docker-runtime\x2drunc-moby-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142-runc.ZVcjvl.mount: Succeeded. 
mai 30 15:31:41 curie systemd[1]: Started libcontainer container 42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142. 
mai 30 15:31:45 curie systemd[1]: docker-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142.scope: Succeeded. 
mai 30 15:31:45 curie dockerd[3199]: time="2022-05-30T15:31:45.883019128-04:00" level=info msg="shim disconnected" id=42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 
mai 30 15:31:45 curie dockerd[1726]: time="2022-05-30T15:31:45.883064491-04:00" level=info msg="ignoring event" container=42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete" 
mai 30 15:31:45 curie systemd[1]: run-docker-netns-e45f5cf5f465.mount: Succeeded. 
mai 30 15:31:45 curie systemd[5287]: run-docker-netns-e45f5cf5f465.mount: Succeeded. 
mai 30 15:31:45 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded. 
mai 30 15:31:45 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded.
That's double or triple the run time, from 2 seconds to 6 seconds. Most of the time is spent in run time, inside the container. Here's the breakdown:
  • container setup: ~2 seconds
  • container run: ~4 seconds
  • container teardown: ~1 second
  • total run time: about ~6-7 seconds
That's a two- to three-fold increase! Clearly something is going on here that I should tweak. It's possible that code path is less optimized in Docker. I also worry about podman, but apparently it also supports ZFS backends. Possibly it would perform better, but at this stage I wouldn't have a good comparison: maybe it would have performed better on non-ZFS as well...

Interactivity While doing the offsite backups (below), the system became somewhat "sluggish". I felt everything was slow, and I estimate it introduced ~50ms latency in any input device. Arguably, those are all USB and the external drive was connected through USB, but I suspect the ZFS drivers are not as well tuned with the scheduler as the regular filesystem drivers...

Recovery procedures For test purposes, I unmounted all systems during the procedure:
umount /mnt/boot/efi /mnt/boot/run
umount -a -t zfs
zpool export -a
And disconnected the drive, to see how I would recover this system from another Linux system in case of a total motherboard failure. To import an existing pool, plug the device, then import the pool with an alternate root, so it doesn't mount over your existing filesystems, then you mount the root filesystem and all the others:
zpool import -l -a -R /mnt &&
zfs mount rpool/ROOT/debian &&
zfs mount -a &&
mount /dev/sdc2 /mnt/boot/efi &&
mount -t tmpfs tmpfs /mnt/run &&
mkdir /mnt/run/lock

Offsite backup Part of the goal of using ZFS is to simplify and harden backups. I wanted to experiment with shorter recovery times specifically both point in time recovery objective and recovery time objective and faster incremental backups. This is, therefore, part of my backup services. This section documents how an external NVMe enclosure was setup in a pool to mirror the datasets from my workstation. The final setup should include syncoid copying datasets to the backup server regularly, but I haven't finished that configuration yet.

Partitioning The above partitioning procedure used sgdisk, but I couldn't figure out how to do this with sgdisk, so this uses sfdisk to dump the partition from the first disk to an external, identical drive:
sfdisk -d /dev/nvme0n1   sfdisk --no-reread /dev/sda --force

Pool creation This is similar to the main pool creation, except we tweaked a few bits after changing the upstream procedure:
zpool create \
        -o cachefile=/etc/zfs/zpool.cache \
        -o ashift=12 -d \
        -o feature@async_destroy=enabled \
        -o feature@bookmarks=enabled \
        -o feature@embedded_data=enabled \
        -o feature@empty_bpobj=enabled \
        -o feature@enabled_txg=enabled \
        -o feature@extensible_dataset=enabled \
        -o feature@filesystem_limits=enabled \
        -o feature@hole_birth=enabled \
        -o feature@large_blocks=enabled \
        -o feature@lz4_compress=enabled \
        -o feature@spacemap_histogram=enabled \
        -o feature@zpool_checkpoint=enabled \
        -O acltype=posixacl -O xattr=sa \
        -O compression=lz4 \
        -O devices=off \
        -O relatime=on \
        -O canmount=off \
        -O mountpoint=/boot -R /mnt \
        bpool-tubman /dev/sdb3
The change from the main boot pool are: Main pool creation is:
zpool create \
        -o ashift=12 \
        -O encryption=on -O keylocation=prompt -O keyformat=passphrase \
        -O acltype=posixacl -O xattr=sa -O dnodesize=auto \
        -O compression=zstd \
        -O relatime=on \
        -O canmount=off \
        -O mountpoint=/ -R /mnt \
        rpool-tubman /dev/sdb4

First sync I used syncoid to copy all pools over to the external device. syncoid is a thing that's part of the sanoid project which is specifically designed to sync snapshots between pool, typically over SSH links but it can also operate locally. The sanoid command had a --readonly argument to simulate changes, but syncoid didn't so I tried to fix that with an upstream PR. It seems it would be better to do this by hand, but this was much easier. The full first sync was:
root@curie:/home/anarcat# ./bin/syncoid -r  bpool bpool-tubman
CRITICAL ERROR: Target bpool-tubman exists but has no snapshots matching with bpool!
                Replication to target would require destroying existing
                target. Cowardly refusing to destroy your existing target.
          NOTE: Target bpool-tubman dataset is < 64MB used - did you mistakenly run
                 zfs create bpool-tubman  on the target? ZFS initial
                replication must be to a NON EXISTENT DATASET, which will
                then be CREATED BY the initial replication process.
INFO: Sending oldest full snapshot bpool/BOOT@test (~ 42 KB) to new target filesystem:
44.2KiB 0:00:00 [4.19MiB/s] [========================================================================================================================] 103%            
INFO: Updating new target filesystem with incremental bpool/BOOT@test ... syncoid_curie_2022-05-30:12:50:39 (~ 4 KB):
2.13KiB 0:00:00 [ 114KiB/s] [===============================================================>                                                         ] 53%            
INFO: Sending oldest full snapshot bpool/BOOT/debian@install (~ 126.0 MB) to new target filesystem:
 126MiB 0:00:00 [ 308MiB/s] [=======================================================================================================================>] 100%            
INFO: Updating new target filesystem with incremental bpool/BOOT/debian@install ... syncoid_curie_2022-05-30:12:50:39 (~ 113.4 MB):
 113MiB 0:00:00 [ 315MiB/s] [=======================================================================================================================>] 100%
root@curie:/home/anarcat# ./bin/syncoid -r  rpool rpool-tubman
CRITICAL ERROR: Target rpool-tubman exists but has no snapshots matching with rpool!
                Replication to target would require destroying existing
                target. Cowardly refusing to destroy your existing target.
          NOTE: Target rpool-tubman dataset is < 64MB used - did you mistakenly run
                 zfs create rpool-tubman  on the target? ZFS initial
                replication must be to a NON EXISTENT DATASET, which will
                then be CREATED BY the initial replication process.
INFO: Sending oldest full snapshot rpool/ROOT@syncoid_curie_2022-05-30:12:50:51 (~ 69 KB) to new target filesystem:
44.2KiB 0:00:00 [2.44MiB/s] [===========================================================================>                                             ] 63%            
INFO: Sending oldest full snapshot rpool/ROOT/debian@install (~ 25.9 GB) to new target filesystem:
25.9GiB 0:03:33 [ 124MiB/s] [=======================================================================================================================>] 100%            
INFO: Updating new target filesystem with incremental rpool/ROOT/debian@install ... syncoid_curie_2022-05-30:12:50:52 (~ 3.9 GB):
3.92GiB 0:00:33 [ 119MiB/s] [======================================================================================================================>  ] 99%            
INFO: Sending oldest full snapshot rpool/home@syncoid_curie_2022-05-30:12:55:04 (~ 276.8 GB) to new target filesystem:
 277GiB 0:27:13 [ 174MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/home/root@syncoid_curie_2022-05-30:13:22:19 (~ 2.2 GB) to new target filesystem:
2.22GiB 0:00:25 [90.2MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var@syncoid_curie_2022-05-30:13:22:47 (~ 5.6 GB) to new target filesystem:
5.56GiB 0:00:32 [ 176MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/cache@syncoid_curie_2022-05-30:13:23:22 (~ 627.3 MB) to new target filesystem:
 627MiB 0:00:03 [ 169MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/lib@syncoid_curie_2022-05-30:13:23:28 (~ 69 KB) to new target filesystem:
44.2KiB 0:00:00 [1.40MiB/s] [===========================================================================>                                             ] 63%            
INFO: Sending oldest full snapshot rpool/var/lib/docker@syncoid_curie_2022-05-30:13:23:28 (~ 442.6 MB) to new target filesystem:
 443MiB 0:00:04 [ 103MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254 (~ 6.3 MB) to new target filesystem:
6.49MiB 0:00:00 [12.9MiB/s] [========================================================================================================================] 102%            
INFO: Updating new target filesystem with incremental rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254 ... syncoid_curie_2022-05-30:13:23:34 (~ 4 KB):
1.52KiB 0:00:00 [27.6KiB/s] [============================================>                                                                            ] 38%            
INFO: Sending oldest full snapshot rpool/var/lib/flatpak@syncoid_curie_2022-05-30:13:23:36 (~ 2.0 GB) to new target filesystem:
2.00GiB 0:00:17 [ 115MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/tmp@syncoid_curie_2022-05-30:13:23:55 (~ 57.0 MB) to new target filesystem:
61.8MiB 0:00:01 [45.0MiB/s] [========================================================================================================================] 108%            
INFO: Clone is recreated on target rpool-tubman/var/lib/docker/ed71ddd563a779ba6fb37b3b1d0cc2c11eca9b594e77b4b234867ebcb162b205 based on rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254
INFO: Sending oldest full snapshot rpool/var/lib/docker/ed71ddd563a779ba6fb37b3b1d0cc2c11eca9b594e77b4b234867ebcb162b205@syncoid_curie_2022-05-30:13:23:58 (~ 218.6 MB) to new target filesystem:
 219MiB 0:00:01 [ 151MiB/s] [=======================================================================================================================>] 100%
Funny how the CRITICAL ERROR doesn't actually stop syncoid and it just carries on merrily doing when it's telling you it's "cowardly refusing to destroy your existing target"... Maybe that's because my pull request broke something though... During the transfer, the computer was very sluggish: everything feels like it has ~30-50ms latency extra:
anarcat@curie:sanoid$ LANG=C top -b  -n 1   head -20
top - 13:07:05 up 6 days,  4:01,  1 user,  load average: 16.13, 16.55, 11.83
Tasks: 606 total,   6 running, 598 sleeping,   0 stopped,   2 zombie
%Cpu(s): 18.8 us, 72.5 sy,  1.2 ni,  5.0 id,  1.2 wa,  0.0 hi,  1.2 si,  0.0 st
MiB Mem :  15898.4 total,   1387.6 free,  13170.0 used,   1340.8 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1319.8 avail Mem 
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     70 root      20   0       0      0      0 S  83.3   0.0   6:12.67 kswapd0
4024878 root      20   0  282644  96432  10288 S  44.4   0.6   0:11.43 puppet
3896136 root      20   0   35328  16528     48 S  22.2   0.1   2:08.04 mbuffer
3896135 root      20   0   10328    776    168 R  16.7   0.0   1:22.93 zfs
3896138 root      20   0   10588    788    156 R  16.7   0.0   1:49.30 zfs
    350 root       0 -20       0      0      0 R  11.1   0.0   1:03.53 z_rd_int
    351 root       0 -20       0      0      0 S  11.1   0.0   1:04.15 z_rd_int
3896137 root      20   0    4384    352    244 R  11.1   0.0   0:44.73 pv
4034094 anarcat   30  10   20028  13960   2428 S  11.1   0.1   0:00.70 mbsync
4036539 anarcat   20   0    9604   3464   2408 R  11.1   0.0   0:00.04 top
    352 root       0 -20       0      0      0 S   5.6   0.0   1:03.64 z_rd_int
    353 root       0 -20       0      0      0 S   5.6   0.0   1:03.64 z_rd_int
    354 root       0 -20       0      0      0 S   5.6   0.0   1:04.01 z_rd_int
I wonder how much of that is due to syncoid, particularly because I often saw mbuffer and pv in there which are not strictly necessary to do those kind of operations, as far as I understand. Once that's done, export the pools to disconnect the drive:
zpool export bpool-tubman
zpool export rpool-tubman

Raw disk benchmark Copied the 512GB SSD/M.2 device to another 1024GB NVMe/M.2 device:
anarcat@curie:~$ sudo dd if=/dev/sdb of=/dev/sdc bs=4M status=progress conv=fdatasync
499944259584 octets (500 GB, 466 GiB) copi s, 1713 s, 292 MB/s
119235+1 enregistrements lus
119235+1 enregistrements  crits
500107862016 octets (500 GB, 466 GiB) copi s, 1719,93 s, 291 MB/s
... while both over USB, whoohoo 300MB/s!

Monitoring ZFS should be monitoring your pools regularly. Normally, the [[!debman zed]] daemon monitors all ZFS events. It is the thing that will report when a scrub failed, for example. See this configuration guide. Scrubs should be regularly scheduled to ensure consistency of the pool. This can be done in newer zfsutils-linux versions (bullseye-backports or bookworm) with one of those, depending on the desired frequency:
systemctl enable zfs-scrub-weekly@rpool.timer --now
systemctl enable zfs-scrub-monthly@rpool.timer --now
When the scrub runs, if it finds anything it will send an event which will get picked up by the zed daemon which will then send a notification, see below for an example. TODO: deploy on curie, if possible (probably not because no RAID) TODO: this should be in Puppet

Scrub warning example So what happens when problems are found? Here's an example of how I dealt with an error I received. After setting up another server (tubman) with ZFS, I eventually ended up getting a warning from the ZFS toolchain.
Date: Sun, 09 Oct 2022 00:58:08 -0400
From: root <root@anarc.at>
To: root@anarc.at
Subject: ZFS scrub_finish event for rpool on tubman
ZFS has finished a scrub:
   eid: 39536
 class: scrub_finish
  host: tubman
  time: 2022-10-09 00:58:07-0400
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 00:33:57 with 0 errors on Sun Oct  9 00:58:07 2022
config:
        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb4    ONLINE       0     1     0
            sdc4    ONLINE       0     0     0
        cache
          sda3      ONLINE       0     0     0
errors: No known data errors
This, in itself, is a little worrisome. But it helpfully links to this more detailed documentation (and props up there: the link still works) which explains this is a "minor" problem (something that could be included in the report). In this case, this happened on a server setup on 2021-04-28, but the disks and server hardware are much older. The server itself (marcos v1) was built around 2011, over 10 years ago now. The hard drive in question is:
root@tubman:~# smartctl -i -qnoserial /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-15-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family:     Seagate BarraCuda 3.5
Device Model:     ST4000DM004-2CV104
Firmware Version: 0001
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5425 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Oct 11 11:02:32 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Some more SMART stats:
root@tubman:~# smartctl -a -qnoserial /dev/sdb   grep -e  Head_Flying_Hours -e Power_On_Hours -e Total_LBA -e 'Sector Sizes'
Sector Sizes:     512 bytes logical, 4096 bytes physical
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       12464 (206 202 0)
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       10966h+55m+23.757s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       21107792664
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       3201579750
That's over a year of power on, which shouldn't be so bad. It has written about 10TB of data (21107792664 LBAs * 512 byte/LBA), which is about two full writes. According to its specification, this device is supposed to support 55 TB/year of writes, so we're far below spec. Note that are still far from the "non-recoverable read error per bits" spec (1 per 10E15), as we've basically read 13E12 bits (3201579750 LBAs * 512 byte/LBA = 13E12 bits). It's likely this disk was made in 2018, so it is in its fourth year. Interestingly, /dev/sdc is also a Seagate drive, but of a different series:
root@tubman:~# smartctl -qnoserial  -i /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-15-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family:     Seagate BarraCuda 3.5
Device Model:     ST4000DM004-2CV104
Firmware Version: 0001
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5425 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Oct 11 11:21:35 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
It has seen much more reads than the other disk which is also interesting:
root@tubman:~# smartctl -a -qnoserial /dev/sdc   grep -e  Head_Flying_Hours -e Power_On_Hours -e Total_LBA -e 'Sector Sizes'
Sector Sizes:     512 bytes logical, 4096 bytes physical
  9 Power_On_Hours          0x0032   059   059   000    Old_age   Always       -       36240
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       33994h+10m+52.118s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       30730174438
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       51894566538
That's 4 years of Head_Flying_Hours, and over 4 years (4 years and 48 days) of Power_On_Hours. The copyright date on that drive's specs goes back to 2016, so it's a much older drive. SMART self-test succeeded.

Remaining issues
  • TODO: move send/receive backups to offsite host, see also zfs for alternatives to syncoid/sanoid there
  • TODO: setup backup cron job (or timer?)
  • TODO: swap still not setup on curie, see zfs
  • TODO: document this somewhere: bpool and rpool are both pools and datasets. that's pretty confusing, but also very useful because it allows for pool-wide recursive snapshots, which are used for the backup system

fio improvements I really want to improve my experience with fio. Right now, I'm just cargo-culting stuff from other folks and I don't really like it. stressant is a good example of my struggles, in the sense that it doesn't really work that well for disk tests. I would love to have just a single .fio job file that lists multiple jobs to run serially. For example, this file describes the above workload pretty well:
[global]
# cargo-culting Salter
fallocate=none
ioengine=posixaio
runtime=60
time_based=1
end_fsync=1
stonewall=1
group_reporting=1
# no need to drop caches, done by default
# invalidate=1
# Single 4KiB random read/write process
[randread-4k-4g-1x]
rw=randread
bs=4k
size=4g
numjobs=1
iodepth=1
[randwrite-4k-4g-1x]
rw=randwrite
bs=4k
size=4g
numjobs=1
iodepth=1
# 16 parallel 64KiB random read/write processes:
[randread-64k-256m-16x]
rw=randread
bs=64k
size=256m
numjobs=16
iodepth=16
[randwrite-64k-256m-16x]
rw=randwrite
bs=64k
size=256m
numjobs=16
iodepth=16
# Single 1MiB random read/write process
[randread-1m-16g-1x]
rw=randread
bs=1m
size=16g
numjobs=1
iodepth=1
[randwrite-1m-16g-1x]
rw=randwrite
bs=1m
size=16g
numjobs=1
iodepth=1
... except the jobs are actually started in parallel, even though they are stonewall'd, as far as I can tell by the reports. I sent a mail to the fio mailing list for clarification. It looks like the jobs are started in parallel, but actual (correctly) run serially. It seems like this might just be a matter of reporting the right timestamps in the end, although it does feel like starting all the processes (even if not doing any work yet) could skew the results.

Hangs during procedure During the procedure, it happened a few times where any ZFS command would completely hang. It seems that using an external USB drive to sync stuff didn't work so well: sometimes it would reconnect under a different device (from sdc to sdd, for example), and this would greatly confuse ZFS. Here, for example, is sdd reappearing out of the blue:
May 19 11:22:53 curie kernel: [  699.820301] scsi host4: uas
May 19 11:22:53 curie kernel: [  699.820544] usb 2-1: authorized to connect
May 19 11:22:53 curie kernel: [  699.922433] scsi 4:0:0:0: Direct-Access     ROG      ESD-S1C          0    PQ: 0 ANSI: 6
May 19 11:22:53 curie kernel: [  699.923235] sd 4:0:0:0: Attached scsi generic sg2 type 0
May 19 11:22:53 curie kernel: [  699.923676] sd 4:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
May 19 11:22:53 curie kernel: [  699.923788] sd 4:0:0:0: [sdd] Write Protect is off
May 19 11:22:53 curie kernel: [  699.923949] sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 19 11:22:53 curie kernel: [  699.924149] sd 4:0:0:0: [sdd] Optimal transfer size 33553920 bytes
May 19 11:22:53 curie kernel: [  699.961602]  sdd: sdd1 sdd2 sdd3 sdd4
May 19 11:22:53 curie kernel: [  699.996083] sd 4:0:0:0: [sdd] Attached SCSI disk
Next time I run a ZFS command (say zpool list), the command completely hangs (D state) and this comes up in the logs:
May 19 11:34:21 curie kernel: [ 1387.914843] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=71344128 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914859] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=205565952 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914874] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=272789504 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914906] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=270336 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.914932] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=1073225728 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.914948] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=1073487872 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.915165] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=272793600 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.915183] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=339853312 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.915648] WARNING: Pool 'bpool' has encountered an uncorrectable I/O failure and has been suspended.
May 19 11:34:21 curie kernel: [ 1387.915648] 
May 19 11:37:25 curie kernel: [ 1571.558614] task:txg_sync        state:D stack:    0 pid:  997 ppid:     2 flags:0x00004000
May 19 11:37:25 curie kernel: [ 1571.558623] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.558640]  __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.558650]  schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.558670]  schedule_timeout+0x8b/0x140
May 19 11:37:25 curie kernel: [ 1571.558675]  ? __next_timer_interrupt+0x110/0x110
May 19 11:37:25 curie kernel: [ 1571.558678]  io_schedule_timeout+0x4c/0x80
May 19 11:37:25 curie kernel: [ 1571.558689]  __cv_timedwait_common+0x12b/0x160 [spl]
May 19 11:37:25 curie kernel: [ 1571.558694]  ? add_wait_queue_exclusive+0x70/0x70
May 19 11:37:25 curie kernel: [ 1571.558702]  __cv_timedwait_io+0x15/0x20 [spl]
May 19 11:37:25 curie kernel: [ 1571.558816]  zio_wait+0x129/0x2b0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.558929]  dsl_pool_sync+0x461/0x4f0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559032]  spa_sync+0x575/0xfa0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559138]  ? spa_txg_history_init_io+0x101/0x110 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559245]  txg_sync_thread+0x2e0/0x4a0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559354]  ? txg_fini+0x240/0x240 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559366]  thread_generic_wrapper+0x6f/0x80 [spl]
May 19 11:37:25 curie kernel: [ 1571.559376]  ? __thread_exit+0x20/0x20 [spl]
May 19 11:37:25 curie kernel: [ 1571.559379]  kthread+0x11b/0x140
May 19 11:37:25 curie kernel: [ 1571.559382]  ? __kthread_bind_mask+0x60/0x60
May 19 11:37:25 curie kernel: [ 1571.559386]  ret_from_fork+0x22/0x30
May 19 11:37:25 curie kernel: [ 1571.559401] task:zed             state:D stack:    0 pid: 1564 ppid:     1 flags:0x00000000
May 19 11:37:25 curie kernel: [ 1571.559404] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.559409]  __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.559412]  ? __kmalloc_node+0x141/0x2b0
May 19 11:37:25 curie kernel: [ 1571.559417]  schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.559420]  schedule_preempt_disabled+0xa/0x10
May 19 11:37:25 curie kernel: [ 1571.559424]  __mutex_lock.constprop.0+0x133/0x460
May 19 11:37:25 curie kernel: [ 1571.559435]  ? nvlist_xalloc.part.0+0x68/0xc0 [znvpair]
May 19 11:37:25 curie kernel: [ 1571.559537]  spa_all_configs+0x41/0x120 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559644]  zfs_ioc_pool_configs+0x17/0x70 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559752]  zfsdev_ioctl_common+0x697/0x870 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559758]  ? _copy_from_user+0x28/0x60
May 19 11:37:25 curie kernel: [ 1571.559860]  zfsdev_ioctl+0x53/0xe0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559866]  __x64_sys_ioctl+0x83/0xb0
May 19 11:37:25 curie kernel: [ 1571.559869]  do_syscall_64+0x33/0x80
May 19 11:37:25 curie kernel: [ 1571.559873]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 19 11:37:25 curie kernel: [ 1571.559876] RIP: 0033:0x7fcf0ef32cc7
May 19 11:37:25 curie kernel: [ 1571.559878] RSP: 002b:00007fcf0e181618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 19 11:37:25 curie kernel: [ 1571.559881] RAX: ffffffffffffffda RBX: 000055b212f972a0 RCX: 00007fcf0ef32cc7
May 19 11:37:25 curie kernel: [ 1571.559883] RDX: 00007fcf0e181640 RSI: 0000000000005a04 RDI: 000000000000000b
May 19 11:37:25 curie kernel: [ 1571.559885] RBP: 00007fcf0e184c30 R08: 00007fcf08016810 R09: 00007fcf08000080
May 19 11:37:25 curie kernel: [ 1571.559886] R10: 0000000000080000 R11: 0000000000000246 R12: 000055b212f972a0
May 19 11:37:25 curie kernel: [ 1571.559888] R13: 0000000000000000 R14: 00007fcf0e181640 R15: 0000000000000000
May 19 11:37:25 curie kernel: [ 1571.559980] task:zpool           state:D stack:    0 pid:11815 ppid:  3816 flags:0x00004000
May 19 11:37:25 curie kernel: [ 1571.559983] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.559988]  __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.559992]  schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.559995]  io_schedule+0x42/0x70
May 19 11:37:25 curie kernel: [ 1571.560004]  cv_wait_common+0xac/0x130 [spl]
May 19 11:37:25 curie kernel: [ 1571.560008]  ? add_wait_queue_exclusive+0x70/0x70
May 19 11:37:25 curie kernel: [ 1571.560118]  txg_wait_synced_impl+0xc9/0x110 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560223]  txg_wait_synced+0xc/0x40 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560325]  spa_export_common+0x4cd/0x590 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560430]  ? zfs_log_history+0x9c/0xf0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560537]  zfsdev_ioctl_common+0x697/0x870 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560543]  ? _copy_from_user+0x28/0x60
May 19 11:37:25 curie kernel: [ 1571.560644]  zfsdev_ioctl+0x53/0xe0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560649]  __x64_sys_ioctl+0x83/0xb0
May 19 11:37:25 curie kernel: [ 1571.560653]  do_syscall_64+0x33/0x80
May 19 11:37:25 curie kernel: [ 1571.560656]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 19 11:37:25 curie kernel: [ 1571.560659] RIP: 0033:0x7fdc23be2cc7
May 19 11:37:25 curie kernel: [ 1571.560661] RSP: 002b:00007ffc8c792478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 19 11:37:25 curie kernel: [ 1571.560664] RAX: ffffffffffffffda RBX: 000055942ca49e20 RCX: 00007fdc23be2cc7
May 19 11:37:25 curie kernel: [ 1571.560666] RDX: 00007ffc8c792490 RSI: 0000000000005a03 RDI: 0000000000000003
May 19 11:37:25 curie kernel: [ 1571.560667] RBP: 00007ffc8c795e80 R08: 00000000ffffffff R09: 00007ffc8c792310
May 19 11:37:25 curie kernel: [ 1571.560669] R10: 000055942ca49e30 R11: 0000000000000246 R12: 00007ffc8c792490
May 19 11:37:25 curie kernel: [ 1571.560671] R13: 000055942ca49e30 R14: 000055942aed2c20 R15: 00007ffc8c795a40
Here's another example, where you see the USB controller bleeping out and back into existence:
mai 19 11:38:39 curie kernel: usb 2-1: USB disconnect, device number 2
mai 19 11:38:39 curie kernel: sd 4:0:0:0: [sdd] Synchronizing SCSI cache
mai 19 11:38:39 curie kernel: sd 4:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
mai 19 11:39:25 curie kernel: INFO: task zed:1564 blocked for more than 241 seconds.
mai 19 11:39:25 curie kernel:       Tainted: P          IOE     5.10.0-14-amd64 #1 Debian 5.10.113-1
mai 19 11:39:25 curie kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mai 19 11:39:25 curie kernel: task:zed             state:D stack:    0 pid: 1564 ppid:     1 flags:0x00000000
mai 19 11:39:25 curie kernel: Call Trace:
mai 19 11:39:25 curie kernel:  __schedule+0x282/0x870
mai 19 11:39:25 curie kernel:  ? __kmalloc_node+0x141/0x2b0
mai 19 11:39:25 curie kernel:  schedule+0x46/0xb0
mai 19 11:39:25 curie kernel:  schedule_preempt_disabled+0xa/0x10
mai 19 11:39:25 curie kernel:  __mutex_lock.constprop.0+0x133/0x460
mai 19 11:39:25 curie kernel:  ? nvlist_xalloc.part.0+0x68/0xc0 [znvpair]
mai 19 11:39:25 curie kernel:  spa_all_configs+0x41/0x120 [zfs]
mai 19 11:39:25 curie kernel:  zfs_ioc_pool_configs+0x17/0x70 [zfs]
mai 19 11:39:25 curie kernel:  zfsdev_ioctl_common+0x697/0x870 [zfs]
mai 19 11:39:25 curie kernel:  ? _copy_from_user+0x28/0x60
mai 19 11:39:25 curie kernel:  zfsdev_ioctl+0x53/0xe0 [zfs]
mai 19 11:39:25 curie kernel:  __x64_sys_ioctl+0x83/0xb0
mai 19 11:39:25 curie kernel:  do_syscall_64+0x33/0x80
mai 19 11:39:25 curie kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
mai 19 11:39:25 curie kernel: RIP: 0033:0x7fcf0ef32cc7
mai 19 11:39:25 curie kernel: RSP: 002b:00007fcf0e181618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
mai 19 11:39:25 curie kernel: RAX: ffffffffffffffda RBX: 000055b212f972a0 RCX: 00007fcf0ef32cc7
mai 19 11:39:25 curie kernel: RDX: 00007fcf0e181640 RSI: 0000000000005a04 RDI: 000000000000000b
mai 19 11:39:25 curie kernel: RBP: 00007fcf0e184c30 R08: 00007fcf08016810 R09: 00007fcf08000080
mai 19 11:39:25 curie kernel: R10: 0000000000080000 R11: 0000000000000246 R12: 000055b212f972a0
mai 19 11:39:25 curie kernel: R13: 0000000000000000 R14: 00007fcf0e181640 R15: 0000000000000000
mai 19 11:39:25 curie kernel: INFO: task zpool:11815 blocked for more than 241 seconds.
mai 19 11:39:25 curie kernel:       Tainted: P          IOE     5.10.0-14-amd64 #1 Debian 5.10.113-1
mai 19 11:39:25 curie kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mai 19 11:39:25 curie kernel: task:zpool           state:D stack:    0 pid:11815 ppid:  2621 flags:0x00004004
mai 19 11:39:25 curie kernel: Call Trace:
mai 19 11:39:25 curie kernel:  __schedule+0x282/0x870
mai 19 11:39:25 curie kernel:  schedule+0x46/0xb0
mai 19 11:39:25 curie kernel:  io_schedule+0x42/0x70
mai 19 11:39:25 curie kernel:  cv_wait_common+0xac/0x130 [spl]
mai 19 11:39:25 curie kernel:  ? add_wait_queue_exclusive+0x70/0x70
mai 19 11:39:25 curie kernel:  txg_wait_synced_impl+0xc9/0x110 [zfs]
mai 19 11:39:25 curie kernel:  txg_wait_synced+0xc/0x40 [zfs]
mai 19 11:39:25 curie kernel:  spa_export_common+0x4cd/0x590 [zfs]
mai 19 11:39:25 curie kernel:  ? zfs_log_history+0x9c/0xf0 [zfs]
mai 19 11:39:25 curie kernel:  zfsdev_ioctl_common+0x697/0x870 [zfs]
mai 19 11:39:25 curie kernel:  ? _copy_from_user+0x28/0x60
mai 19 11:39:25 curie kernel:  zfsdev_ioctl+0x53/0xe0 [zfs]
mai 19 11:39:25 curie kernel:  __x64_sys_ioctl+0x83/0xb0
mai 19 11:39:25 curie kernel:  do_syscall_64+0x33/0x80
mai 19 11:39:25 curie kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
mai 19 11:39:25 curie kernel: RIP: 0033:0x7fdc23be2cc7
mai 19 11:39:25 curie kernel: RSP: 002b:00007ffc8c792478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
mai 19 11:39:25 curie kernel: RAX: ffffffffffffffda RBX: 000055942ca49e20 RCX: 00007fdc23be2cc7
mai 19 11:39:25 curie kernel: RDX: 00007ffc8c792490 RSI: 0000000000005a03 RDI: 0000000000000003
mai 19 11:39:25 curie kernel: RBP: 00007ffc8c795e80 R08: 00000000ffffffff R09: 00007ffc8c792310
mai 19 11:39:25 curie kernel: R10: 000055942ca49e30 R11: 0000000000000246 R12: 00007ffc8c792490
mai 19 11:39:25 curie kernel: R13: 000055942ca49e30 R14: 000055942aed2c20 R15: 00007ffc8c795a40
I understand those are rather extreme conditions: I would fully expect the pool to stop working if the underlying drives disappear. What doesn't seem acceptable is that a command would completely hang like this.

References See the zfs documentation for more information about ZFS, and tubman for another installation and migration procedure.

12 November 2022

Debian Brasil: About Debian Brasil at Latinoware 2022

From November 2nd to 4th, 2022, the 19th edition of Latinoware - Latin American Congress of Free Software and Open Technologies took place in Foz do Igua u. After 2 years happening online due to the COVID-19 pandemic, the event was back in person and we felt Debian Brasil community should be there. Out last time at Latinoware was in 2016 The Latinoware organization provided the Debian Brazil community with a booth so that we could have contact with people visiting the open exhibition area and thus publicize the Debian project. During the 3 days of the event, the booth was organized by me (Paulo Henrique Santana) as Debian Developer, and by Leonardo Rodrigues as Debian contributor. Unfortunately Daniel Lenharo had an issue and could not travel to Foz do Igua u (we miss you there!). Latinoware 2022 booth 1 A huge number of people visited the booth, and the beginners (mainly students) who didn't know Debian, asked what our group was about and we explained various concepts such as what Free Software is, GNU/Linux distribution and Debian itself. We also received people from the Brazilian Free Software community and from other Latin American countries who were already using a GNU/Linux distribution and, of course, many people who were already using Debian. We had some special visitors as Jon maddog Hall, Debian Developer Emeritus Ot vio Salvador, Debian Developer Eriberto Mota, and Debian Maintainers Guilherme de Paula Segundo and Paulo Kretcheu. Latinoware 2022 booth 4 Photo from left to right: Leonardo, Paulo, Eriberto and Ot vio. Latinoware 2022 estande 5 Photo from left to right: Paulo, Fabian (Argentina) and Leonardo. In addition to talking a lot, we distributed Debian stickers that were produced a few months ago with Debian's sponsorship to be distributed at DebConf22 (and that were left over), and we sold several Debian t-shirts) produced by Curitiba Livre community). Latinoware 2022 booth 2 Latinoware 2022 booth 3 We also had 3 talks included in Latinoware official schedule. I) talked about: "how to become a Debian contributor by doing translations" and "how the SysAdmins of a global company use Debian". And Leonardo) talked about: "advantages of Open Source telephony in companies". Latinoware 2022 booth 6 Photo Paulo in his talk. Many thanks to Latinoware organization for once again welcoming the Debian community and kindly providing spaces for our participation, and we congratulate all the people involved in the organization for the success of this important event for our community. We hope to be present again in 2023. We also thank Jonathan Carter for approving financial support from Debian for our participation at Latinoware. Portuguese version

Debian Brasil: Participa o da comunidade Debian no Latinoware 2022

De 2 a 4 de novembro de 2022 aconteceu a 19 edi o do Latinoware - Congresso Latino-americano de Software Livre e Tecnologias Abertas, em Foz do Igua u. Ap s 2 anos acontecendo de forma online devido a pandemia do COVID-19, o evento voltou a ser presencial e sentimos que a comunidade Debian Brasil deveria estar presente. Nossa ltima participa o no Latinoware foi em 2016 A organiza o do Latinoware cedeu para a comunidade Debian Brasil um estande para que pud ssemos ter contato com as pessoas que visitavam a rea aberta de exposi es e assim divulgarmos o projeto Debian. Durante os 3 dias do evento, o estande foi organizado por mim (Paulo Henrique Santana) como Desenvolvedor Debian, e pelo Leonardo Rodrigues como contribuidor Debian. Infelizmente o Daniel Lenharo teve um imprevisto de ltima hora e n o pode ir para Foz do Igua u (sentimos sua falta l !). Latinoware 2022 estande 1 V rias pessoas visitaram o estande e aquelas mais iniciantes (principalmente estudantes) que n o conheciam o Debian, perguntavam do que se tratava o nosso grupo e a gente explicava v rios conceitos como o que Software Livre, distribui o GNU/Linux e o Debian propriamente dito. Tamb m recebemos pessoas da comunidade de Software Livre brasileira e de outros pa ses da Am rica Latina que j utilizavam uma distribui o GNU/Linux e claro, muitas pessoas que j utilizavam Debian. Tivemos algumas visitas especiais como do Jon maddog Hall, do Desenvolvedor Debian Emeritus Ot vio Salvador, do Desenvolvedor Debian Eriberto Mota, e dos Mantenedores Debian Guilherme de Paula Segundo e Paulo Kretcheu. Latinoware 2022 estande 4 Foto da esquerda pra direita: Leonardo, Paulo, Eriberto e Ot vio. Latinoware 2022 estande 5 Foto da esquerda pra direita: Paulo, Fabian (Argentina) e Leonardo. Al m de conversarmos bastante, distribu mos adesivos do Debian que foram produzidos alguns meses atr s com o patroc nio do Debian para serem distribu dos na DebConf22(e que haviam sobrado), e vendemos v rias camisetas do Debian produzidas pela comunidade Curitiba Livre. Latinoware 2022 estande 2 Latinoware 2022 estande 3 Tamb m tivemos 3 palestras inseridas na programa o oficial do Latinoware. Eu fiz as palestras: como tornar um(a) contribuidor(a) do Debian fazendo tradu es e como os SysAdmins de uma empresa global usam Debian . E o Leonardo fez a palestra: vantagens da telefonia Open Source nas empresas . Latinoware 2022 estande 6 Foto Paulo na palestra. Agradecemos a organiza o do Latinoware por receber mais uma vez a comunidade Debian e gentilmente ceder os espa os para a nossa participa o, e parabenizamos a todas as pessoas envolvidas na organiza o pelo sucesso desse importante evento para a nossa comunidade. Esperamos estar presentes novamente em 2023. Agracemos tamb m ao Jonathan Carter por aprovar o suporte financeiro do Debian para a nossa participa o no Latinoware. Vers o em ingl s

11 November 2022

Debian Brasil: Participa o da comunidade Debian no Latinoware 2022

De 2 a 4 de novembro de 2022 aconteceu a 19 edi o do Latinoware - Congresso Latino-americano de Software Livre e Tecnologias Abertas, em Foz do Igua u. Ap s 2 anos acontecendo de forma online devido a pandemia do COVID-19, o evento voltou a ser presencial e sentimos que a comunidade Debian Brasil deveria estar presente. Nossa ltima participa o no Latinoware foi em 2016 A organiza o do Latinoware cedeu para a comunidade Debian Brasil um estande para que pud ssemos ter contato com as pessoas que visitavam a rea aberta de exposi es e assim divulgarmos o projeto Debian. Durante os 3 dias do evento, o estande foi organizado por mim (Paulo Henrique Santana) como Desenvolvedor Debian, e pelo Leonardo Rodrigues como contribuidor Debian. Infelizmente o Daniel Lenharo teve um imprevisto de ltima hora e n o pode ir para Foz do Igua u (sentimos sua falta l !). Latinoware 2022 estande 1 V rias pessoas visitaram o estande e aquelas mais iniciantes (principalmente estudantes) que n o conheciam o Debian, perguntavam do que se tratava o nosso grupo e a gente explicava v rios conceitos como o que Software Livre, distribui o GNU/Linux e o Debian propriamente dito. Tamb m recebemos pessoas da comunidade de Software Livre brasileira e de outros pa ses da Am rica Latina que j utilizavam uma distribui o GNU/Linux e claro, muitas pessoas que j utilizavam Debian. Tivemos algumas visitas especiais como do Jon maddog Hall, do Desenvolvedor Debian Emeritus Ot vio Salvador, do Desenvolvedor Debian Eriberto Mota, e dos Mantenedores Debian Guilherme de Paula Segundo e Paulo Kretcheu. Latinoware 2022 estande 4 Foto da esquerda pra direita: Leonardo, Paulo, Eriberto e Ot vio. Latinoware 2022 estande 5 Foto da esquerda pra direita: Paulo, Fabian (Argentina) e Leonardo. Al m de conversarmos bastante, distribu mos adesivos do Debian que foram produzidos alguns meses atr s com o patroc nio do Debian para serem distribu dos na DebConf22(e que haviam sobrado), e vendemos v rias camisetas do Debian produzidas pela comunidade Curitiba Livre. Latinoware 2022 estande 2 Latinoware 2022 estande 3 Tamb m tivemos 3 palestras inseridas na programa o oficial do Latinoware. Eu fiz as palestras: como tornar um(a) contribuidor(a) do Debian fazendo tradu es e como os SysAdmins de uma empresa global usam Debian . E o Leonardo fez a palestra: vantagens da telefonia Open Source nas empresas . Latinoware 2022 estande 6 Foto Paulo na palestra. Agradecemos a organiza o do Latinoware por receber mais uma vez a comunidade Debian e gentilmente ceder os espa os para a nossa participa o, e parabenizamos a todas as pessoas envolvidas na organiza o pelo sucesso desse importante evento para a nossa comunidade. Esperamos estar presentes novamente em 2023. Agracemos tamb m ao Jonathan Carter por aprovar o suporte financeiro do Debian para a nossa participa o no Latinoware. Vers o em ingl s

Debian Brasil: About Debian Brasil at Latinoware 2022

From November 2nd to 4th, 2022, the 19th edition of Latinoware - Latin American Congress of Free Software and Open Technologies took place in Foz do Igua u. After 2 years happening online due to the COVID-19 pandemic, the event was back in person and we felt Debian Brasil community should be there. Out last time at Latinoware was in 2016 The Latinoware organization provided the Debian Brazil community with a booth so that we could have contact with people visiting the open exhibition area and thus publicize the Debian project. During the 3 days of the event, the booth was organized by me (Paulo Henrique Santana) as Debian Developer, and by Leonardo Rodrigues as Debian contributor. Unfortunately Daniel Lenharo had an issue and could not travel to Foz do Igua u (we miss you there!). Latinoware 2022 booth 1 A huge number of people visited the booth, and the beginners (mainly students) who didn't know Debian, asked what our group was about and we explained various concepts such as what Free Software is, GNU/Linux distribution and Debian itself. We also received people from the Brazilian Free Software community and from other Latin American countries who were already using a GNU/Linux distribution and, of course, many people who were already using Debian. We had some special visitors as Jon maddog Hall, Debian Developer Emeritus Ot vio Salvador, Debian Developer Eriberto Mota, and Debian Maintainers Guilherme de Paula Segundo and Paulo Kretcheu. Latinoware 2022 booth 4 Photo from left to right: Leonardo, Paulo, Eriberto and Ot vio. Latinoware 2022 estande 5 Photo from left to right: Paulo, Fabian (Argentina) and Leonardo. In addition to talking a lot, we distributed Debian stickers that were produced a few months ago with Debian's sponsorship to be distributed at DebConf22 (and that were left over), and we sold several Debian t-shirts) produced by Curitiba Livre community). Latinoware 2022 booth 2 Latinoware 2022 booth 3 We also had 3 talks included in Latinoware official schedule. I) talked about: "how to become a Debian contributor by doing translations" and "how the SysAdmins of a global company use Debian". And Leonardo) talked about: "advantages of Open Source telephony in companies". Latinoware 2022 booth 6 Photo Paulo in his talk. Many thanks to Latinoware organization for once again welcoming the Debian community and kindly providing spaces for our participation, and we congratulate all the people involved in the organization for the success of this important event for our community. We hope to be present again in 2023. We also thank Jonathan Carter for approving financial support from Debian for our participation at Latinoware. Portuguese version

9 November 2022

Debian Brasil: Brasileiros(as) Mantenedores(as) e Desenvolvedores(as) Debian a partir de julho de 2015

Desde de setembro de 2015, o time de publicidade do Projeto Debian passou a publicar a cada dois meses listas com os nomes dos(as) novos(as) Desenvolvedores(as) Debian (DD - do ingl s Debian Developer) e Mantenedores(as) Debian (DM - do ingl s Debian Maintainer). Estamos aproveitando estas listas para publicar abaixo os nomes dos(as) brasileiros(as) que se tornaram Desenvolvedores(as) e Mantenedores(as) Debian a partir de julho de 2015. Desenvolvedores(as) Debian / Debian Developers / DDs: Marcos Talau Fabio Augusto De Muzio Tobich Gabriel F. T. Gomes Thiago Andrade Marques M rcio de Souza Oliveira Paulo Henrique de Lima Santana Samuel Henrique S rgio Durigan J nior Daniel Lenharo de Souza Giovani Augusto Ferreira Adriano Rafael Gomes Breno Leit o Lucas Kanashiro Herbert Parentes Fortes Neto Mantenedores(as) Debian / Debian Maintainers / DMs: Guilherme de Paula Xavier Segundo David da Silva Polverari Paulo Roberto Alves de Oliveira Sergio Almeida Cipriano Junior Francisco Vilmar Cardoso Ruviaro William Grzybowski Tiago Ilieve
Observa es:
  1. Esta lista ser atualizada quando o time de publicidade do Debian publicar novas listas com DMs e DDs e tiver brasileiros.
  2. Para ver a lista completa de Mantenedores(as) e Desenvolvedores(as) Debian, inclusive outros(as) brasileiros(as) antes de julho de 2015 acesse: https://nm.debian.org/public/people

Debian Brasil: Brasileiros(as) Mantenedores(as) e Desenvolvedores(as) Debian a partir de julho de 2015

Desde de setembro de 2015, o time de publicidade do Projeto Debian passou a publicar a cada dois meses listas com os nomes dos(as) novos(as) Desenvolvedores(as) Debian (DD - do ingl s Debian Developer) e Mantenedores(as) Debian (DM - do ingl s Debian Maintainer). Estamos aproveitando estas listas para publicar abaixo os nomes dos(as) brasileiros(as) que se tornaram Desenvolvedores(as) e Mantenedores(as) Debian a partir de julho de 2015. Desenvolvedores(as) Debian / Debian Developers / DDs: Marcos Talau Fabio Augusto De Muzio Tobich Gabriel F. T. Gomes Thiago Andrade Marques M rcio de Souza Oliveira Paulo Henrique de Lima Santana Samuel Henrique S rgio Durigan J nior Daniel Lenharo de Souza Giovani Augusto Ferreira Adriano Rafael Gomes Breno Leit o Lucas Kanashiro Herbert Parentes Fortes Neto Mantenedores(as) Debian / Debian Maintainers / DMs: Guilherme de Paula Xavier Segundo David da Silva Polverari Paulo Roberto Alves de Oliveira Sergio Almeida Cipriano Junior Francisco Vilmar Cardoso Ruviaro William Grzybowski Tiago Ilieve
Observa es:
  1. Esta lista ser atualizada quando o time de publicidade do Debian publicar novas listas com DMs e DDs e tiver brasileiros.
  2. Para ver a lista completa de Mantenedores(as) e Desenvolvedores(as) Debian, inclusive outros(as) brasileiros(as) antes de julho de 2015 acesse: https://nm.debian.org/public/people

25 October 2022

Dirk Eddelbuettel: nanotime 0.3.7 on CRAN: Enhancements

A new version of our nanotime package arrived at CRAN today as version 0.3.7. nanotime relies on the RcppCCTZ package (as well as the RcppDate package for additional C++ operations) and offers efficient high(er) resolution time parsing and formatting up to nanosecond resolution, and the bit64 package for the actual integer64 arithmetic. Initially implemented using the S3 system, it has benefitted greatly from a rigorous refactoring by Leonardo who not only rejigged nanotime internals in S4 but also added new S4 types for periods, intervals and durations. This release adds a few more operators, plus some other fixes, that were contributed in several PRs by Trevor Davis. The NEWS snippet has the full details.

Changes in version 0.3.7 (2022-10-23)
  • Update mkdocs for material docs generator (Dirk in #102)
  • Use inherits() instead comparing to class() (Trevor Davis in #104)
  • Set default arguments in nanoduration() (Trevor Davis in #105)
  • Add as.nanoduration.difftime() support (Trevor Davis in #106)
  • Add +/- methods for nanotime and difftime objects (Trevor Davis in #110 closing #108, #107)

Thanks to my CRANberries there is also a diff to the previous version. More details and examples are at the nanotime page; code, issue tickets etc at the GitHub repository and all documentation is provided at the nanotime documentation site. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

10 October 2022

Ian Jackson: Skipping releases when upgrading Debian systems

Debian does not officially support upgrading from earlier than the previous stable release: you re not supposed to skip releases. Instead, you re supposed to upgrade to each intervening major release in turn. However, skipping intervening releases does, in fact, often work quite well. Apparently, this is surprising to many people, even Debian insiders. I was encouraged to write about it some more. My personal experience I have three conventionally-managed personal server systems (by which I mean systems which aren t reprovisioned by some kind of automation). Of these at least two have been skip upgraded at least once: The one I don t think I ve skip-upgraded (at least, not recently) is my house network manager (and now VM host) which I try to keep to a minimum in terms of functionality and which I keep quite up to date. It was crossgraded from i386 (32-bit) to amd64 (64-bit) fairly recently, which is a thing that Debian isn t sure it supports. The crossgrade was done a hurry and without any planning, prompted by Spectre et al suddenly requiring big changes to Xen. But it went well enough. My home does random stuff server (media server, web cache, printing, DNS, backups etc.), has etckeeper records starting in 2015. I upgraded directly from jessie (Debian 8) to buster (Debian 10). I think it has probably had earlier skip upgrade(s): the oldest file in /etc is from December 1996 and I have been doing occasional skip upgrades as long as I can remember. And of course there s chiark, which is one of the oldest Debian installs in existence. I wrote about the most recent upgrade, where I went directly from jessie i386 ELTS (32-bit Debian 8) to bulleye amd64 (64-bit Debian 11). That was a very extreme case which required significant planning and pre-testing, since the package dependencies were in no way sufficient for the proper ordering. But, I don t normally go to such lengths. Normally, even on chiark, I just edit the sources.list and see what apt proposes to do. I often skip upgrade chiark because I tend to defer risky-looking upgrades partly in the hope of others fixing the bugs while I wait :-), and partly just because change is disruptive and amortising it is very helpful both to me and my users. I have some records of chiark s upgrades from my announcements to users. As well as the recent skip skip up cross grade, direct , I definitely did a skip upgrade of chiark from squeeze (Debian 6) to jessie (Debian 8). It appears that the previous skip upgrade on chiark was rex (Debian 1.2) to hamm (Debian 2.0). I don t think it s usual for me to choose to do a multi-release upgrade the officially supported way, in two (or more) stages, on a server. I have done that on systems with a GUI desktop setup, but even then I usually skip the intermediate reboot(s). When to skip upgrade (and what precautions to take) I m certainly not saying that everyone ought to be doing this routinely. Most users with a Debian install that is older than oldstable probably ought to reinstall it, or do the two-stage upgrade. Skip upgrading almost always runs into some kind of trouble (albeit, usually trouble that isn t particularly hard to fix if you know what you re doing). However, officially supported non-skip upgrades go wrong too. Doing a two-or-more-releases upgrade via the intermediate releases can expose you to significant bugs in the intermediate releases, which were later fixed. Because Debian s users and downstreams are cautious, and Debian itself can be slow, it is common for bugs to appear for one release and then be fixed only in the next. Paradoxically, this seems to be especially true with the kind of big and scary changes where you d naively think the upgrade mechanisms would break if you skipped the release where the change first came in. I would not recommend a skip upgrade to someone who is not a competent Debian administrator, with good familiarity with Debian package management, including use of dpkg directly to fix things up. You should have a mental toolkit of manual bug workaround techniques. I always should make sure that I have rescue media (and in the case of a remote system, full remote access including ability to boot a different image), although I don t often need it. And, when considering a skip upgrade, you should be aware of the major changes that have occurred in Debian. Skip upgrading is more likely to be a good idea with a complex and highly customised system: a fairly vanilla install is not likely to encounter problems during a two-stage update. (And, a vanilla system can be upgraded by reinstalling.) I haven t recently skip upgraded a laptop or workstation. I doubt I would attempt it; modern desktop software seems to take a much harder line about breaking things that are officially unsupported, and generally trying to force everyone into the preferred mold. A request to Debian maintainers I would like to encourage Debian maintainers to defer removing upgrade compatibility machinery until it is actually getting in the way, or has become hazardous, or many years obsolete. Examples of the kinds of things which it would be nice to keep, and which do not usually cause much inconvenience to retain, are dependency declarations (particularly, alternatives), and (many) transitional fragments in maintainer scripts. If you find yourself needing to either delete some compatibility feature, or refactor/reorganise it, I think it is probably best to delete it. If you modify it significantly, the resulting thing (which won t be tested until someone uses it in anger) is quite likely to have bugs which make it go wrong more badly (or, more confusingly) than the breakage that would happen without it. Obviously this is all a judgement call. I m not saying Debian should formally support skip upgrades, to the extent of (further) slowing down important improvements. Nor am I asking for any change to the routine approach to (for example) transitional packages (i.e. the technique for ensuring continuity of installation when a package name changes). We try to make release upgrades work perfectly; but skip upgrades don t have to work perfectly to be valuable. Retaining compatibility code can also make it easier to provide official backports, and it probably helps downstreams with different release schedules. The fact that maintainers do in practice often defer removing compatibility code provides useful flexibility and options to at least some people. So it would be nice if you d at least not go out of your way to break it. Building on older releases I would also like to encourage maintainers to provide source packages in Debian unstable that will still build on older releases, where this isn t too hard and the resulting binaries might be basically functional. Speaking personally, it s not uncommon for me to rebuild packages from unstable and install them on much older releases. This is another thing that is not officially supported, but which often works well. I m not saying to contort your build system, or delay progress. You ll definitely want to depend on a recentish debhelper. But, for example, retaining old build-dependency alternatives is nice. Retaining them doesn t constitute a promise that it works - it just makes life slightly easier for someone who is going off piste. If you know you have users on multiple distros or multiple releases, and wish to fully support them, you can go further, of course. Many of my own packages are directly buildable, or even directly installable, on older releases.

comment count unavailable comments

30 July 2022

Ian Jackson: chiark s skip-skip-cross-up-grade

Two weeks ago I upgraded chiark from Debian jessie i386 to bullseye amd64, after nearly 30 years running Debian i386. This went really quite well, in fact! Background chiark is my colo - a server I run, which lives in a data centre in London. It hosts ~200 users with shell accounts, various websites and mailing lists, moderators for a number of USENET newsgroups, and countless other services. chiark s internal setup is designed to enable my users to do a maximum number of exciting things with a minimum of intervention from me. chiark s OS install dates to 1993, when I installed Debian 0.93R5, the first version of Debian to advertise the ability to be upgraded without reinstalling. I think that makes it one of the oldest Debian installations in existence. Obviously it s had several new hardware platforms too. (There was a prior install of Linux on the initial hardware, remnants of which can maybe still be seen in some obscure corners of chiark s /usr/local.) chiark s install is also at the very high end of the installation complexity, and customisation, scale: reinstalling it completely would be an enormous amount of work. And it s unique. chiark s upgrade history chiark s last major OS upgrade was to jessie (Debian 8, released in April 2015). That was in 2016. Since then we have been relying on Debian s excellent security support posture, and the Debian LTS and more recently Freexian s Debian ELTS projects and some local updates, The use of ELTS - which supports only a subset of packages - was particularly uncomfortable. Additionally, chiark was installed with 32-bit x86 Linux (Debian i386), since that was what was supported and available at the time. But 32-bit is looking very long in the tooth. Why do a skip upgrade So, I wanted to move to the fairly recent stable release - Debian 11 (bullseye), which is just short of a year old. And I wanted to crossgrade (as its called) to 64-bit. In the past, I have found I have had greater success by doing direct upgrades, skipping intermediate releases, rather than by following the officially-supported path of going via every intermediate release. Doing a skip upgrade avoids exposure to any packaging bugs which were present only in intermediate release(s). Debian does usually fix bugs, but Debian has many cautious users, so it is not uncommon for bugs to be found after release, and then not be fixed until the next one. A skip upgrade avoids the need to try to upgrade to already-obsolete releases (which can involve messing about with multiple snapshots from snapshot.debian.org. It is also significantly faster and simpler, which is important not only because it reduces downtime, but also because it removes opportunities (and reduces the time available) for things to go badly. One downside is that sometimes maintainers aggressively remove compatibility measures for older releases. (And compatibililty packages are generally removed quite quickly by even cautious maintainers.) That means that the sysadmin who wants to skip-upgrade needs to do more manual fixing of things that haven t been dealt with automatically. And occasionally one finds compatibility problems that show up only when mixing very old and very new software, that no-one else has seen. Crossgrading Crossgrading is fairly complex and hazardous. It is well supported by the low level tools (eg, dpkg) but the higher-level packaging tools (eg, apt) get very badly confused. Nowadays the system is so complex that downloading things by hand and manually feeding them to dpkg is impractical, other than as a very occasional last resort. The approach, generally, has been to set the system up to want to be the new architecture, run apt in a download-only mode, and do the package installation manually, with some fixing up and retrying, until the system is coherent enough for apt to work. This is the approach I took. (In current releases, there are tools that will help but they are only in recent releases and I wanted to go direct. I also doubted that they would work properly on chiark, since it s so unusual.) Peril and planning Overall, this was a risky strategy to choose. The package dependencies wouldn t necessarily express all of the sequencing needed. But it still seemed that if I could come up with a working recipe, I could do it. I restored most of one of chiark s backups onto a scratch volume on my laptop. With the LVM snapshot tools and chroots. I was able to develop and test a set of scripts that would perform the upgrade. This was a very effective approach: my super-fast laptop, with local caches of the package repositories, was able to do many edit, test, debug cycles. My recipe made heavy use of snapshot.debian.org, to make sure that it wouldn t rot between testing and implementation. When I had a working scheme, I told my users about the planned downtime. I warned everyone it might take even 2 or 3 days. I made sure that my access arrangemnts to the data centre were in place, in case I needed to visit in person. (I have remote serial console and power cycler access.) Reality - the terrible rescue install My first task on taking the service down was the check that the emergency rescue installation worked: chiark has an ancient USB stick in the back, which I can boot to from the BIOS. The idea being that many things that go wrong could be repaired from there. I found that that install was too old to understand chiark s storage arrangements. mdadm tools gave very strange output. So I needed to upgrade it. After some experiments, I rebooted back into the main install, bringing chiark s service back online. I then used the main install of chiark as a kind of meta-rescue-image for the rescue-image. The process of getting the rescue image upgraded (not even to amd64, but just to something not totally ancient) was fraught. Several times I had to rescue it by copying files in from the main install outside. And, the rescue install was on a truly ancient 2G USB stick which was terribly terribly slow, and also very small. I hadn t done any significant planning for this subtask, because it was low-risk: there was little way to break the main install. Due to all these adverse factors, sorting out the rescue image took five hours. If I had known how long it would take, at the beginning, I would have skipped it. 5 hours is more than it would have taken to go to London and fix something in person. Reality - the actual core upgrade I was able to start the actual upgrade in the mid-afternoon. I meticulously checked and executed the steps from my plan. The terrifying scripts which sequenced the critical package updates ran flawlessly. Within an hour or so I had a system which was running bullseye amd64, albeit with many important packages still missing or unconfigured. So I didn t need the rescue image after all, nor to go to the datacentre. Fixing all the things Then I had to deal with all the inevitable fallout from an upgrade. Notable incidents: exim4 has a new tainting system This is to try to help the sysadmin avoid writing unsafe string interpolations. ( Little Bobby Tables. ) This was done by Exim upstream in a great hurry as part of a security response process. The new checks meant that the mail configuration did not work at all. I had to turn off the taint check completely. I m fairly confident that this is correct, because I am hyper-aware of quoting issues and all of my configuration is written to avoid the problems that tainting is supposed to avoid. One particular annoyance is that the approach taken for sqlite lookups makes it totally impossible to use more than one sqlite database. I think the sqlite quoting operator which one uses to interpolate values produces tainted output? I need to investigate this properly. LVM now ignores PVs which are directly contained within LVs by default chiark has LVM-on-RAID-on-LVM. This generally works really well. However, there was one edge case where I ended up without the intermediate RAID layer. The result is LVM-on-LVM. But recent versions of the LVM tools do not look at PVs inside LVs, by default. This is to help you avoid corrupting the state of any VMs you have on your system. I didn t know that at the time, though. All I knew was that LVM was claiming my PV was unusable , and wouldn t explain why. I was about to start on a thorough reading of the 15,000-word essay that is the commentary in the default /etc/lvm/lvm.conf to try to see if anything was relevant, when I received a helpful tipoff on IRC pointing me to the scan_lvs option. I need to file a bug asking for the LVM tools to explain why they have declared a PV unuseable. apache2 s default config no longer read one of my config files I had to do a merge (of my changes vs the maintainers changes) for /etc/apache2/apache2.conf. When doing this merge I failed to notice that the file /etc/apache2/conf.d/httpd.conf was no longer included by default. My merge dropped that line. There were some important things in there, and until I found this the webserver was broken. dpkg --skip-same-version DTWT during a crossgrade (This is not a fix all the things - I found it when developing my upgrade process.) When doing a crossgrade, one often wants to say to dpkg install all these things, but don t reinstall things that have already been done . That s what --skip-same-version is for. However, the logic had not been updated as part of the work to support multiarch, so it was wrong. I prepared a patched version of dpkg, and inserted it in the appropriate point in my prepared crossgrade plan. The patch is now filed as bug #1014476 against dpkg upstream Mailman Mailman is no longer in bullseye. It s only available in the previous release, buster. bullseye has Mailman 3 which is a totally different system - requiring basically, a completely new install and configuration. To even preserve existing archive links (a very important requirement) is decidedly nontrivial. I decided to punt on this whole situation. Currently chiark is running buster s version of Mailman. I will have to deal with this at some point and I m not looking forward to it. Python Of course that Mailman is Python 2. The Python project s extremely badly handled transition includes a recommendation to change the meaning of #!/usr/bin/python from Python 2, to Python 3. But Python 3 is a new language, barely compatible with Python 2 even in the most recent iterations of both, and it is usual to need to coinstall them. Happily Debian have provided the python-is-python2 package to make things work sensibly, albeit with unpleasant imprecations in the package summary description. USENET news Oh my god. INN uses many non-portable data formats, which just depend on your C types. And there are complicated daemons, statically linked libraries which cache on-disk data, and much to go wrong. I had numerous problems with this, and several outages and malfunctions. I may write about that on a future occasion.
(edited 2022-07-20 11:36 +01:00 and 2022-07-30 12:28+01:00 to fix typos)


comment count unavailable comments

30 May 2022

Gunnar Wolf: On to the next journey

Last Wednesday my father, Kurt Bernardo Wolf Bogner, took the steps towards his next journey, the last that would start in this life. I cannot put words to this so just sharing this with the world will have to suffice. Goodbye to my teacher, my friend, the person I have always looked up to. Some of his friends were able to put in words more than what I can come up with. If you can read Spanish, you can read the eulogy from the Science Academy of Morelos. His last project, enjoyable by anybody who reads Spanish, is the book with his account of his youth travels Asia, Africa and Europe, beteen 1966 and 1970. You can read it online. And I have many printed copies, in case you want one as well. We will always remember you with love.

21 March 2022

Bits from Debian: New Debian Developers and Maintainers (January and February 2022)

The following contributor got his Debian Developer account in the last two months: The following contributors were added as Debian Maintainers in the last two months: Congratulations!

11 March 2022

Dirk Eddelbuettel: dtts 0.1.0 on CRAN: New Package

Leonardo and I are thrilled to announce the first CRAN release of dtts. The dtts package builds on top of both our nanotime package and the well-loved and widely-used data.table package by Matt, Arun, Jan, and numerous collaborators. In a very rough nutshell, you can think of dtts as combining both these potent ingredients to produce something not-entirely-unlike the venerable xts package by our friends Jeff and Josh but using highest-precision nanosecond increments rather than not-quite-microseconds or dates. The package is still somewhat rare and bare: it is mostly just alignment operators. But because of the power and genius of data.table not all that much more is needed because data.table gets us fifteen years of highly refined, tuned and tested code for data slicing, dicing, and aggregation. To which we now humbly add nanosecond-resolution indexing and alignment. The package had been simmering for some time, and does of course take advantage of (a lot of) earlier work by Leonardo on his ztsdb project. We look forward to user feedback and suggestions at the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

6 March 2022

Dirk Eddelbuettel: nanotime 0.3.6 on CRAN: Updates

Leonardo and I are pleased to another update to our nanotime package bringing it to version 0.3.6 which landed on CRAN earlier today. nanotime relies on the RcppCCTZ package (as well as the RcppDate package for additional C++ operations) and offers efficient high(er) resolution time parsing and formatting up to nanosecond resolution, and the bit64 package for the actual integer64 arithmetic. Initially implemented using the S3 system, it has benefitted greatly from a rigorous refactoring by Leonardo who not only rejigged nanotime internals in S4 but also added new S4 types for periods, intervals and durations. This release corrects subsetting with %in% operator, integrates it better fit in the mixed S3/S4 setup, fixes a negative period parse, and updates class comparisons to rely on inherits(). The NEWS snippet has the full more details.

Changes in version 0.3.6 (2022-03-06)
  • Fix incorrect subsetting with operator %in% (Leonardo in #100 fixing #99).
  • Fix incorrect parsing for negative nanoperiod (Leonardo in #100 fixing #96).
  • Test for class via inherits() (Dirk).

Thanks to my CRANberries there is also a diff to the previous version. More details and examples are at the nanotime page; code, issue tickets etc at the GitHub repository and all documentation is provided at the nanotime documentation site. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

23 February 2022

Ian Jackson: Rooting an Eos Fairphone 4

Last week I received (finally) my Fairphone 4, supplied with a de-googled operating system, which I had ordered from the E Foundation s shop in December. (I m am very hard on hardware and my venerable Fairphone 2 is really on its last legs.) I expect to have full control over the software on any computing device I own which is as complicated, capable, and therefore, hazardous, as a mobile phone. Unfortunately the Eos image (they prefer to spell it /e/ os , srsly!) doesn t come with a way to get root without taking fairly serious measures including unlocking the bootloader. Unlocking the bootloader wouldn t be desirable for me but I can t live without root. So. I started with these helpful instructions: https://forum.xda-developers.com/t/fairphone-4-root.4376421/ I found the whole process a bit of a trial, and I thought I would write down what I did. But, it s not straightforward, at least for someone like me who only has a dim understanding of all this Android stuff. Unfortunately, due to the number of missteps and restarts, what I actually did is not really a sensible procedure. So here is a retcon of a process I think will work: Unlock the bootloader The E Foundation provide instructions for unlocking the bootloader on a stock FP4, here https://doc.e.foundation/devices/FP4/install and they seem applicable to the Murena phone supplied with Eos pre-installed, too. NB tht unlocking the bootloader wipes the phone. So we do it first. So:
  1. Power on the phone, with no SIM installed
  2. You get a welcome screen.
  3. Skip all things on startup including wifi
  4. Go to the very end of the settings, tap a gazillion times on the phone s version until you re a developer
  5. In the developer settings, allow usb debugging
  6. In the developer settings, allow oem bootloader unlocking
  7. Connect a computer via a USB cable, say yes on phone to USB debugging
  8. adb reboot bootloader
  9. The phone will reboot into a texty kind of screen, the bootloader
  10. fastboot flashing unlock
  11. The phone will reboot, back to the welcome screen
  12. Repeat steps 3-9 (maybe not all are necessary)
  13. fastboot flashing unlock_critical
  14. The phone will reboot, back to the welcome screen
Note that although you are running fastboot, you must run this command with the phone in bootloader mode, not fastboot (aka fastbootd ) mode. If you run fastboot flashing unlcok from fastboot you just get a don t know what you re talking about . I found conflicting instructions on what kind of Vulcan nerve pinches could be used to get into which boot modes, and had poor experiences with those. adb reboot bootloader always worked reliably for me. Some docs say to run fastboot oem unlock; I used flashing. Maybe this depends on the Android tools version. Initial privacy prep and OTA update We want to update the supplied phone OS. The build mine shipped with is too buggy to run Magisk, the application we are going to use to root the phone. (With the pre-installed phone OS, Magisk crashes at the patch boot image step.) But I didn t want to let the phone talk to Google, even for the push notifications registration.
  1. From the welcome screen, skip all things except location, date, time. Notably, do not set up wifi
  2. In settings, microg section
    1. turn off cloud messaging
    2. turn off google safetynet
    3. turn off google registration (NB you must do this after the other two, because their sliders become dysfunctional after you turn google registration off)
    4. turn off both location modules
  3. In settings, location section, turn off allowed location for browser and magic earth
  4. Now go into settings and enable wifi, giving it your wifi details
  5. Tell the phone to update its operating system. This is a big download.
Install Magisk, the root manager (As a starting point I used these instructions https://www.xda-developers.com/how-to-install-magisk/ and a lot of random forum posts.) You will need the official boot.img. Bizarrely there doesn t seem to be a way to obtain this from the phone. Instead, you must download it. You can find it by starting at https://doc.e.foundation/devices/FP4/install which links to https://images.ecloud.global/stable/FP4/. At the time of writing, the most recent version, whose version number seemed to correspond to the OS update I installed above, was IMG-e-0.21-r-20220123158735-stable-FP4.zip.
  1. Download the giant zipfile to your computer
  2. Unzip it to extract boot.img
  3. Copy the file to your phone s storage . Eg, via adb: with the phone booted into the main operating system, using USB debugging, adb push boot.img /storage/self/primary/Download.
  4. On the phone, open the browser, and enter https://f-droid.org. Click on the link to install f-droid. You will need to enable installing apps from the browser (follow the provided flow to the settings, change the setting, and then use Back, and you can do the install). If you wish, you can download the f-droid apk separately on a computer, and verify it with pgp.
  5. Using f-droid, install Magisk. You will need to enable installing apps from f-droid. (I installed Magisk from f-droid because 1. I was going to trust f-droid anyway 2. it has a shorter URL than Magisk s.)
  6. Open the Magisk app. Tell Magisk to install (Magisk, not the app). There will be only one option: patch boot file. Tell it to patch the boot.img file from before.
  7. Transfer the magisk_patched-THING.img back to your computer (eg via adb pull).
  8. adb reboot bootloader
  9. fastboot boot magisk_patched-THING.img (again, NB, from bootloader mode, not from fastboot mode)
  10. In Magisk you ll see it shows as installed. But it s not really; you ve just booted from an image with it. Ask to install Magisk with Direct install .
After you have done all this, I believe that each time you do an over-the-air OS update, you must, between installing the update and rebooting the phone, ask Magisk to Install to inactive slot (after OTA) . Presumably if you forget you must do the fastboot boot dance again. After all this, I was able to use tsu in Termux. There s a strange behaviour with the root prompt you get apropos Termux s request for root; I found that it definitely worked if Termux wasn t the foreground app You have to leave the bootloader unlocked. Howwever, as I understand it, the phone s encryption will still prevent an attacker from hoovering the data out of your phone. The bootloader lock is to prevent someone tricking you into entering the decryption passkey into a trojaned device. Other things to change There are probably other things to change. I have not yet transferred my Signal account from my old phone. It is possible that Signal will require me to re-enable the google push notifications, but I hope that having disabled them in microg it will be happy to use its own system, as it does on my old phone.

comment count unavailable comments

16 January 2022

Chris Lamb: Favourite films of 2021

In my four most recent posts, I went over the memoirs and biographies, the non-fiction, the fiction and the 'classic' novels that I enjoyed reading the most in 2021. But in the very last of my 2021 roundup posts, I'll be going over some of my favourite movies. (Saying that, these are perhaps less of my 'favourite films' than the ones worth remarking on after all, nobody needs to hear that The Godfather is a good movie.) It's probably helpful to remark you that I took a self-directed course in film history in 2021, based around the first volume of Roger Ebert's The Great Movies. This collection of 100-odd movie essays aims to make a tour of the landmarks of the first century of cinema, and I watched all but a handul before the year was out. I am slowly making my way through volume two in 2022. This tome was tremendously useful, and not simply due to the background context that Ebert added to each film: it also brought me into contact with films I would have hardly come through some other means. Would I have ever discovered the sly comedy of Trouble in Paradise (1932) or the touching proto-realism of L'Atalante (1934) any other way? It also helped me to 'get around' to watching films I may have put off watching forever the influential Battleship Potemkin (1925), for instance, and the ur-epic Lawrence of Arabia (1962) spring to mind here. Choosing a 'worst' film is perhaps more difficult than choosing the best. There are first those that left me completely dry (Ready or Not, Written on the Wind, etc.), and those that were simply poorly executed. And there are those that failed to meet their own high opinions of themselves, such as the 'made for Reddit' Tenet (2020) or the inscrutable Vanilla Sky (2001) the latter being an almost perfect example of late-20th century cultural exhaustion. But I must save my most severe judgement for those films where I took a visceral dislike how their subjects were portrayed. The sexually problematic Sixteen Candles (1984) and the pseudo-Catholic vigilantism of The Boondock Saints (1999) both spring to mind here, the latter of which combines so many things I dislike into such a short running time I'd need an entire essay to adequately express how much I disliked it.

Dogtooth (2009) A father, a mother, a brother and two sisters live in a large and affluent house behind a very high wall and an always-locked gate. Only the father ever leaves the property, driving to the factory that he happens to own. Dogtooth goes far beyond any allusion to Josef Fritzl's cellar, though, as the children's education is a grotesque parody of home-schooling. Here, the parents deliberately teach their children the wrong meaning of words (e.g. a yellow flower is called a 'zombie'), all of which renders the outside world utterly meaningless and unreadable, and completely mystifying its very existence. It is this creepy strangeness within a 'regular' family unit in Dogtooth that is both socially and epistemically horrific, and I'll say nothing here of its sexual elements as well. Despite its cold, inscrutable and deadpan surreality, Dogtooth invites all manner of potential interpretations. Is this film about the artificiality of the nuclear family that the West insists is the benchmark of normality? Or is it, as I prefer to believe, something more visceral altogether: an allegory for the various forms of ontological violence wrought by fascism, as well a sobering nod towards some of fascism's inherent appeals? (Perhaps it is both. In 1972, French poststructuralists Gilles and F lix Guattari wrote Anti-Oedipus, which plays with the idea of the family unit as a metaphor for the authoritarian state.) The Greek-language Dogtooth, elegantly shot, thankfully provides no easy answers.

Holy Motors (2012) There is an infamous scene in Un Chien Andalou, the 1929 film collaboration between Luis Bu uel and famed artist Salvador Dal . A young woman is cornered in her own apartment by a threatening man, and she reaches for a tennis racquet in self-defence. But the man suddenly picks up two nearby ropes and drags into the frame two large grand pianos... each leaden with a dead donkey, a stone tablet, a pumpkin and a bewildered priest. This bizarre sketch serves as a better introduction to Leos Carax's Holy Motors than any elementary outline of its plot, which ostensibly follows 24 hours in the life of a man who must play a number of extremely diverse roles around Paris... all for no apparent reason. (And is he even a man?) Surrealism as an art movement gets a pretty bad wrap these days, and perhaps justifiably so. But Holy Motors and Un Chien Andalou serve as a good reminder that surrealism can be, well, 'good, actually'. And if not quite high art, Holy Motors at least demonstrates that surrealism can still unnerving and hilariously funny. Indeed, recalling the whimsy of the plot to a close friend, the tears of laughter came unbidden to my eyes once again. ("And then the limousines...!") Still, it is unclear how Holy Motors truly refreshes surrealism for the twenty-first century. Surrealism was, in part, a reaction to the mechanical and unfeeling brutality of World War I and ultimately sought to release the creative potential of the unconscious mind. Holy Motors cannot be responding to another continental conflagration, and so it appears to me to be some kind of commentary on the roles we exhibit in an era of 'post-postmodernity': a sketch on our age of performative authenticity, perhaps, or an idle doodle on the function and psychosocial function of work. Or perhaps not. After all, this film was produced in a time that offers the near-universal availability of mind-altering substances, and this certainly changes the context in which this film was both created. And, how can I put it, was intended to be watched.

Manchester by the Sea (2016) An absolutely devastating portrayal of a character who is unable to forgive himself and is hesitant to engage with anyone ever again. It features a near-ideal balance between portraying unrecoverable anguish and tender warmth, and is paradoxically grandiose in its subtle intimacy. The mechanics of life led me to watch this lying on a bed in a chain hotel by Heathrow Airport, and if this colourless circumstance blunted the film's emotional impact on me, I am probably thankful for it. Indeed, I find myself reduced in this review to fatuously recalling my favourite interactions instead of providing any real commentary. You could write a whole essay about one particular incident: its surfaces, subtexts and angles... all despite nothing of any substance ever being communicated. Truly stunning.

McCabe & Mrs. Miller (1971) Roger Ebert called this movie one of the saddest films I have ever seen, filled with a yearning for love and home that will not ever come. But whilst it is difficult to disagree with his sentiment, Ebert's choice of sad is somehow not quite the right word. Indeed, I've long regretted that our dictionaries don't have more nuanced blends of tragedy and sadness; perhaps the Ancient Greeks can loan us some. Nevertheless, the plot of this film is of a gambler and a prostitute who become business partners in a new and remote mining town called Presbyterian Church. However, as their town and enterprise booms, it comes to the attention of a large mining corporation who want to bully or buy their way into the action. What makes this film stand out is not the plot itself, however, but its mood and tone the town and its inhabitants seem to be thrown together out of raw lumber, covered alternatively in mud or frozen ice, and their days (and their personalities) are both short and dark in equal measure. As a brief aside, if you haven't seen a Roger Altman film before, this has all the trappings of being a good introduction. As Ebert went on to observe: This is not the kind of movie where the characters are introduced. They are all already here. Furthermore, we can see some of Altman's trademark conversations that overlap, a superb handling of ensemble casts, and a quietly subversive view of the tyranny of 'genre'... and the latter in a time when the appetite for revisionist portrays of the West was not very strong. All of these 'Altmanian' trademarks can be ordered in much stronger measures in his later films: in particular, his comedy-drama Nashville (1975) has 24 main characters, and my jejune interpretation of Gosford Park (2001) is that it is purposefully designed to poke fun those who take a reductionist view of 'genre', or at least on the audience's expectations. (In this case, an Edwardian-era English murder mystery in the style of Agatha Christie, but where no real murder or detection really takes place.) On the other hand, McCabe & Mrs. Miller is actually a poor introduction to Altman. The story is told in a suitable deliberate and slow tempo, and the two stars of the film are shown thoroughly defrocked of any 'star status', in both the visual and moral dimensions. All of these traits are, however, this film's strength, adding up to a credible, fascinating and riveting portrayal of the old West.

Detour (1945) Detour was filmed in less than a week, and it's difficult to decide out of the actors and the screenplay which is its weakest point.... Yet it still somehow seemed to drag me in. The plot revolves around luckless Al who is hitchhiking to California. Al gets a lift from a man called Haskell who quickly falls down dead from a heart attack. Al quickly buries the body and takes Haskell's money, car and identification, believing that the police will believe Al murdered him. An unstable element is soon introduced in the guise of Vera, who, through a set of coincidences that stretches credulity, knows that this 'new' Haskell (ie. Al pretending to be him) is not who he seems. Vera then attaches herself to Al in order to blackmail him, and the world starts to spin out of his control. It must be understood that none of this is executed very well. Rather, what makes Detour so interesting to watch is that its 'errors' lend a distinctively creepy and unnatural hue to the film. Indeed, in the early twentieth century, Sigmund Freud used the word unheimlich to describe the experience of something that is not simply mysterious, but something creepy in a strangely familiar way. This is almost the perfect description of watching Detour its eerie nature means that we are not only frequently second-guessed about where the film is going, but are often uncertain whether we are watching the usual objective perspective offered by cinema. In particular, are all the ham-fisted segues, stilted dialogue and inscrutable character motivations actually a product of Al inventing a story for the viewer? Did he murder Haskell after all, despite the film 'showing' us that Haskell died of natural causes? In other words, are we watching what Al wants us to believe? Regardless of the answers to these questions, the film succeeds precisely because of its accidental or inadvertent choices, so it is an implicit reminder that seeking the director's original intention in any piece of art is a complete mirage. Detour is certainly not a good film, but it just might be a great one. (It is a short film too, and, out of copyright, it is available online for free.)

Safe (1995) Safe is a subtly disturbing film about an upper-middle-class housewife who begins to complain about vague symptoms of illness. Initially claiming that she doesn't feel right, Carol starts to have unexplained headaches, a dry cough and nosebleeds, and eventually begins to have trouble breathing. Carol's family doctor treats her concerns with little care, and suggests to her husband that she sees a psychiatrist. Yet Carol's episodes soon escalate. For example, as a 'homemaker' and with nothing else to occupy her, Carol's orders a new couch for a party. But when the store delivers the wrong one (although it is not altogether clear that they did), Carol has a near breakdown. Unsure where to turn, an 'allergist' tells Carol she has "Environmental Illness," and so Carol eventually checks herself into a new-age commune filled with alternative therapies. On the surface, Safe is thus a film about the increasing about of pesticides and chemicals in our lives, something that was clearly felt far more viscerally in the 1990s. But it is also a film about how lack of genuine healthcare for women must be seen as a critical factor in the rise of crank medicine. (Indeed, it made for something of an uncomfortable watch during the coronavirus lockdown.) More interestingly, however, Safe gently-yet-critically examines the psychosocial causes that may be aggravating Carol's illnesses, including her vacant marriage, her hollow friends and the 'empty calorie' stimulus of suburbia. None of this should be especially new to anyone: the gendered Victorian term 'hysterical' is often all but spoken throughout this film, and perhaps from the very invention of modern medicine, women's symptoms have often regularly minimised or outright dismissed. (Hilary Mantel's 2003 memoir, Giving Up the Ghost is especially harrowing on this.) As I opened this review, the film is subtle in its messaging. Just to take one example from many, the sound of the cars is always just a fraction too loud: there's a scene where a group is eating dinner with a road in the background, and the total effect can be seen as representing the toxic fumes of modernity invading our social lives and health. I won't spoiler the conclusion of this quietly devasting film, but don't expect a happy ending.

The Driver (1978) Critics grossly misunderstood The Driver when it was first released. They interpreted the cold and unemotional affect of the characters with the lack of developmental depth, instead of representing their dissociation from the society around them. This reading was encouraged by the fact that the principal actors aren't given real names and are instead known simply by their archetypes instead: 'The Driver', 'The Detective', 'The Player' and so on. This sort of quasi-Jungian erudition is common in many crime films today (Reservoir Dogs, Kill Bill, Layer Cake, Fight Club), so the critics' misconceptions were entirely reasonable in 1978. The plot of The Driver involves the eponymous Driver, a noted getaway driver for robberies in Los Angeles. His exceptional talent has far prevented him from being captured thus far, so the Detective attempts to catch the Driver by pardoning another gang if they help convict the Driver via a set-up robbery. To give himself an edge, however, The Driver seeks help from the femme fatale 'Player' in order to mislead the Detective. If this all sounds eerily familiar, you would not be far wrong. The film was essentially remade by Nicolas Winding Refn as Drive (2011) and in Edgar Wright's 2017 Baby Driver. Yet The Driver offers something that these neon-noir variants do not. In particular, the car chases around Los Angeles are some of the most captivating I've seen: they aren't thrilling in the sense of tyre squeals, explosions and flying boxes, but rather the vehicles come across like wild animals hunting one another. This feels especially so when the police are hunting The Driver, which feels less like a low-stakes game of cat and mouse than a pack of feral animals working together a gang who will tear apart their prey if they find him. In contrast to the undercar neon glow of the Fast & Furious franchise, the urban realism backdrop of the The Driver's LA metropolis contributes to a sincere feeling of artistic fidelity as well. To be sure, most of this is present in the truly-excellent Drive, where the chase scenes do really communicate a credible sense of stakes. But the substitution of The Driver's grit with Drive's soft neon tilts it slightly towards that common affliction of crime movies: style over substance. Nevertheless, I can highly recommend watching The Driver and Drive together, as it can tell you a lot about the disconnected socioeconomic practices of the 1980s compared to the 2010s. More than that, however, the pseudo-1980s synthwave soundtrack of Drive captures something crucial to analysing the world of today. In particular, these 'sounds from the past filtered through the present' bring to mind the increasing role of nostalgia for lost futures in the culture of today, where temporality and pop culture references are almost-exclusively citational and commemorational.

The Souvenir (2019) The ostensible outline of this quietly understated film follows a shy but ambitious film student who falls into an emotionally fraught relationship with a charismatic but untrustworthy older man. But that doesn't quite cover the plot at all, for not only is The Souvenir a film about a young artist who is inspired, derailed and ultimately strengthened by a toxic relationship, it is also partly a coming-of-age drama, a subtle portrait of class and, finally, a film about the making of a film. Still, one of the geniuses of this truly heartbreaking movie is that none of these many elements crowds out the other. It never, ever feels rushed. Indeed, there are many scenes where the camera simply 'sits there' and quietly observes what is going on. Other films might smother themselves through references to 18th-century oil paintings, but The Souvenir somehow evades this too. And there's a certain ring of credibility to the story as well, no doubt in part due to the fact it is based on director Joanna Hogg's own experiences at film school. A beautifully observed and multi-layered film; I'll be happy if the sequel is one-half as good.

The Wrestler (2008) Randy 'The Ram' Robinson is long past his prime, but he is still rarin' to go in the local pro-wrestling circuit. Yet after a brutal beating that seriously threatens his health, Randy hangs up his tights and pursues a serious relationship... and even tries to reconnect with his estranged daughter. But Randy can't resist the lure of the ring, and readies himself for a comeback. The stage is thus set for Darren Aronofsky's The Wrestler, which is essentially about what drives Randy back to the ring. To be sure, Randy derives much of his money from wrestling as well as his 'fitness', self-image, self-esteem and self-worth. Oh, it's no use insisting that wrestling is fake, for the sport is, needless to say, Randy's identity; it's not for nothing that this film is called The Wrestler. In a number of ways, The Sound of Metal (2019) is both a reaction to (and a quiet remake of) The Wrestler, if only because both movies utilise 'cool' professions to explore such questions of identity. But perhaps simply when The Wrestler was produced makes it the superior film. Indeed, the role of time feels very important for the Wrestler. In the first instance, time is clearly taking its toll on Randy's body, but I felt it more strongly in the sense this was very much a pre-2008 film, released on the cliff-edge of the global financial crisis, and the concomitant precarity of the 2010s. Indeed, it is curious to consider that you couldn't make The Wrestler today, although not because the relationship to work has changed in any fundamentalway. (Indeed, isn't it somewhat depressing the realise that, since the start of the pandemic and the 'work from home' trend to one side, we now require even more people to wreck their bodies and mental health to cover their bills?) No, what I mean to say here is that, post-2016, you cannot portray wrestling on-screen without, how can I put it, unwelcome connotations. All of which then reminds me of Minari's notorious red hat... But I digress. The Wrestler is a grittily stark darkly humorous look into the life of a desperate man and a sorrowful world, all through one tragic profession.

Thief (1981) Frank is an expert professional safecracker and specialises in high-profile diamond heists. He plans to use his ill-gotten gains to retire from crime and build a life for himself with a wife and kids, so he signs on with a top gangster for one last big score. This, of course, could be the plot to any number of heist movies, but Thief does something different. Similar to The Wrestler and The Driver (see above) and a number of other films that I watched this year, Thief seems to be saying about our relationship to work and family in modernity and postmodernity. Indeed, the 'heist film', we are told, is an understudied genre, but part of the pleasure of watching these films is said to arise from how they portray our desired relationship to work. In particular, Frank's desire to pull off that last big job feels less about the money it would bring him, but a displacement from (or proxy for) fulfilling some deep-down desire to have a family or indeed any relationship at all. Because in theory, of course, Frank could enter into a fulfilling long-term relationship right away, without stealing millions of dollars in diamonds... but that's kinda the entire point: Frank needing just one more theft is an excuse to not pursue a relationship and put it off indefinitely in favour of 'work'. (And being Federal crimes, it also means Frank cannot put down meaningful roots in a community.) All this is communicated extremely subtly in the justly-lauded lowkey diner scene, by far the best scene in the movie. The visual aesthetic of Thief is as if you set The Warriors (1979) in a similarly-filthy Chicago, with the Xenophon-inspired plot of The Warriors replaced with an almost deliberate lack of plot development... and the allure of The Warriors' fantastical criminal gangs (with their alluringly well-defined social identities) substituted by a bunch of amoral individuals with no solidarity beyond the immediate moment. A tale of our time, perhaps. I should warn you that the ending of Thief is famously weak, but this is a gritty, intelligent and strangely credible heist movie before you get there.

Uncut Gems (2019) The most exhausting film I've seen in years; the cinematic equivalent of four cups of double espresso, I didn't even bother even trying to sleep after downing Uncut Gems late one night. Directed by the two Safdie Brothers, it often felt like I was watching two films that had been made at the same time. (Or do I mean two films at 2X speed?) No, whatever clumsy metaphor you choose to adopt, the unavoidable effect of this film's finely-tuned chaos is an uncompromising and anxiety-inducing piece of cinema. The plot follows Howard as a man lost to his countless vices mostly gambling with a significant side hustle in adultery, but you get the distinct impression he would be happy with anything that will give him another high. A true junkie's junkie, you might say. You know right from the beginning it's going to end in some kind of disaster, the only question remaining is precisely how and what. Portrayed by an (almost unrecognisable) Adam Sandler, there's an uncanny sense of distance in the emotional chasm between 'Sandler-as-junkie' and 'Sandler-as-regular-star-of-goofy-comedies'. Yet instead of being distracting and reducing the film's affect, this possibly-deliberate intertextuality somehow adds to the masterfully-controlled mayhem. My heart races just at the memory. Oof.

Woman in the Dunes (1964) I ended up watching three films that feature sand this year: Denis Villeneuve's Dune (2021), Lawrence of Arabia (1962) and Woman in the Dunes. But it is this last 1964 film by Hiroshi Teshigahara that will stick in my mind in the years to come. Sure, there is none of the Medician intrigue of Dune or the Super Panavision-70 of Lawrence of Arabia (or its quasi-orientalist score, itself likely stolen from Anton Bruckner's 6th Symphony), but Woman in the Dunes doesn't have to assert its confidence so boldly, and it reveals the enormity of its plot slowly and deliberately instead. Woman in the Dunes never rushes to get to the film's central dilemma, and it uncovers its terror in little hints and insights, all whilst establishing the daily rhythm of life. Woman in the Dunes has something of the uncanny horror as Dogtooth (see above), as well as its broad range of potential interpretations. Both films permit a wide array of readings, without resorting to being deliberately obscurantist or being just plain random it is perhaps this reason why I enjoyed them so much. It is true that asking 'So what does the sand mean?' sounds tediously sophomoric shorn of any context, but it somehow applies to this thoughtfully self-contained piece of cinema.

A Quiet Place (2018) Although A Quiet Place was not actually one of the best films I saw this year, I'm including it here as it is certainly one of the better 'mainstream' Hollywood franchises I came across. Not only is the film very ably constructed and engages on a visceral level, I should point out that it is rare that I can empathise with the peril of conventional horror movies (and perhaps prefer to focus on its cultural and political aesthetics), but I did here. The conceit of this particular post-apocalyptic world is that a family is forced to live in almost complete silence while hiding from creatures that hunt by sound alone. Still, A Quiet Place engages on an intellectual level too, and this probably works in tandem with the pure 'horrorific' elements and make it stick into your mind. In particular, and to my mind at least, A Quiet Place a deeply American conservative film below the surface: it exalts the family structure and a certain kind of sacrifice for your family. (The music often had a passacaglia-like strain too, forming a tombeau for America.) Moreover, you survive in this dystopia by staying quiet that is to say, by staying stoic suggesting that in the wake of any conflict that might beset the world, the best thing to do is to keep quiet. Even communicating with your loved ones can be deadly to both of you, so not emote, acquiesce quietly to your fate, and don't, whatever you do, speak up. (Or join a union.) I could go on, but The Quiet Place is more than this. It's taut and brief, and despite cinema being an increasingly visual medium, it encourages its audience to develop a new relationship with sound.

1 January 2022

Utkarsh Gupta: FOSS Activites in December 2021

Here s my (twenty-seventh) monthly but brief update about the activities I ve done in the F/L/OSS world.

Debian
This was my 36th month of actively contributing to Debian. I became a DM in late March 2019 and a DD on Christmas 19! \o/ Just churning through the backlog again this month. Ugh. Anyway, I did the following stuff in Debian:

Uploads and bug fixes:
  • ruby2.7 (2.7.5-1) - New upstream version fixing 3 new CVEs.

Other $things:
  • Mentoring for newcomers.
  • Moderation of -project mailing list.

Ubuntu
This was my 11th month of actively contributing to Ubuntu. Now that I ve joined Canonical to work on Ubuntu full-time, there s a bunch of things I do! \o/ I mostly worked on different things, I guess. I was too lazy to maintain a list of things I worked on so there s no concrete list atm. Maybe I ll get back to this section later or will start to list stuff from next year onward, as I was doing before. :D

Debian (E)LTS
Debian Long Term Support (LTS) is a project to extend the lifetime of all Debian stable releases to (at least) 5 years. Debian LTS is not handled by the Debian security team, but by a separate group of volunteers and companies interested in making it a success. And Debian Extended LTS (ELTS) is its sister project, extending support to the Jessie release (+2 years after LTS support). This was my twenty-seventh month as a Debian LTS and eighteenth month as a Debian ELTS paid contributor.
I was assigned 40.00 hours for LTS and 60.00 hours for ELTS and worked on the following things:
(since I had a 3-week vacation, I wanted to wrap things up that were pending and so I worked for 20h more for LTS, which I ll compensate the next month!)

LTS CVE Fixes and Announcements:

ELTS CVE Fixes and Announcements:
  • Issued ELA 525-2, fixing CVE-2021-43527, for nss.
    For Debian 8 jessie, these problems have been fixed in version 2:3.26-1+debu8u15.
  • Issued ELA 530-1, for systemd.
    For Debian 8 jessie, these problems have been fixed in version 215-17+deb8u14.
  • Issued ELA 531-1, fixing CVE-2021-41817 and CVE-2021-41819, for ruby2.1.
    For Debian 8 jessie, these problems have been fixed in version 2.1.5-2+deb8u13.
  • Issued ELA 533-1, fixing CVE-2018-12020, for python-gnupg.
    For Debian 8 jessie, these problems have been fixed in version 0.3.6-1+deb8u2.
  • Issued ELA 536-1, fixing CVE-2021-43818, for lxml.
    For Debian 8 jessie, these problems have been fixed in version Prior to version 4.6.5, the HTML Cleaner in lxml.html lets certain.
  • Started working on src:samba for CVE-2020-25717 to CVE-2020-25722 and CVE-2021-23192 for jessie and stretch, both.
    The version difference b/w the suites are a bit too much for the patch(es) to be easily backported. I ve talked to Anton to work something out. \o/
  • Found the problem w/ libjdom1-java. Will have to roll the regression upload.
    I ve prepared the patch but needs some testing to be finally rolled out. Same for stretch.

Other (E)LTS Work:
  • Front-desk duty from 29-11 to 05-12 and 20-12 to 26-12 for both LTS and ELTS.
  • Triaged ffmpeg, git, gpac, inetutils, mc, modsecurity-crs, node-object-path, php-pear, systemd-cron, node-tar, ruby2.3, gst-plugins-bad0.10, npm, nltk, request-tracker4, ros-ros-comm, mediawiki, ruby2.1, ckeditor, ntfs-3g, tiff, wordpress, and jsoup, udisks2, libgit2, python3.5, python3.4, and openssh.
  • Mark CVE-2021-38171/ffmpeg as postponed for stretch.
  • Mark CVE-2021-40330/git as no-dsa for stretch and jessie.
  • Mark CVE-2020-19481/gpac as ignored for stretch.
  • Mark CVE-2021-40491/inetutils as no-dsa for stretch.
  • Mark CVE-2021-36370/mc as no-dsa for stretch and jessie.
  • Mark CVE-2021-35368/modsecurity-crs as no-dsa for stretch.
  • Mark CVE-2021-23434/node-object-path as end-of-life for stretch.
  • Mark CVE-2021-32610/php-pear as no-dsa for stretch.
  • Mark CVE-2017-9525/systemd-cron as no-dsa for stretch.
  • Mark CVE-2021-37701/node-tar as end-of-life for stretch.
  • Mark CVE-2021-37712/node-tar as end-of-life in stretch.
  • Mark CVE-2021-39201/wordpress as not-affected for jessie.
  • Mark CVE-2020-19143/tiff as not-affected for stretch and jessie.
  • Mark CVE-2021-38562/request-tracker4 as no-dsa for stretch.
  • Mark CVE-2021-37146/ros-ros-comm as no-dsa for stretch.
  • Mark CVE-2021-28965/ruby2.1 as ignored for jessie.
  • Mark CVE-2021-37714/jsoup as ignored for jessie.
  • Mark CVE-2021-41617/openssh as no-dsa for jessie.
  • Auto EOL ed ardour, nltk, request-tracker4, python-scrapy, webkit2gtk, and linux for jessie.
  • Attended monthly Debian LTS meeting.
  • Answered questions (& discussions) on IRC (#debian-lts and #debian-elts).
  • General and other discussions on LTS private and public mailing list.

Debian LTS Survey I ve spent 5 hours on the LTS survey on the following bits:
  • Went through the old content on the previous survey.
  • Reviewed the new content - still more work to do.
  • Discussed the survey bits in the team meeting.
  • Partly reviewing the questions of the survey.
  • Walking through the instance to find the doability of the tasks discussed in the meeting.
  • Segregating and staging questions. More work to do here.

Until next time.
:wq for today.

29 December 2021

Russ Allbery: Review: A Spindle Splintered

Review: A Spindle Splintered, by Alix E. Harrow
Series: Fractured Fables #1
Publisher: Tordotcom
Copyright: 2021
ISBN: 1-250-76536-6
Format: Kindle
Pages: 121
Zinnia Gray lives in rural Ohio and is obsessed with Sleeping Beauty, even though the fairy tale objectively sucks. That has a lot to do with having Generalized Roseville Malady, an always-fatal progressive amyloidosis caused by teratogenic industrial waste. No one with GRM has ever lived to turn twenty-two. A Spindle Splintered opens on Zinnia's twenty-first birthday. For her birthday, her best (and only) friend Charm (Charmaine Baldwin) throws her a party at the tower. There aren't a lot of towers in Ohio; this one is a guard tower at an abandoned state penitentiary occasionally used by the local teenagers, which is not quite the image one would get from fairy tales. But Charm fills it with roses, guests wearing cheap fairy wings, beer, and even an honest-to-god spinning wheel. At the end of the night, Zinnia decides to prick her finger on the spindle on a whim. Much to both of their surprise, that's enough to trigger some form of magic in Zinnia's otherwise entirely mundane world. She doesn't fall asleep for a thousand years, but she does get dumped into an actual fairy-tale tower near an actual princess, just in time to prevent her from pricking her finger. This is, as advertised on the tin, a fractured fairy tale, but it's one that barely introduces the Sleeping Beauty story before driving it entirely off the rails. It's also a fractured fairy tale in which the protagonist knows exactly what sort of story she's in, given that she graduated early from high school and has a college degree in folk studies. (Dying girl rule #1: move fast.) And it's one in which the fairy tale universe still has cell reception, if not chargers, which means you can text your best friend sarcastic commentary on your multiversal travels. Also, cell phone pictures of the impossibly beautiful princess. I should mention up-front that I have not watched Spider-Man: Into the Spider-Verse (yes, I know, I'm sure it's wonderful, I just don't watch things, basically ever), which is a quite explicit inspiration for this story. I'm therefore not sure how obvious the story would be to people familiar with that movie. Even with my familiarity with the general genre of fractured fairy tales, nothing plot-wise here was all that surprising. What carries this story is the characters and the emotional core, particularly Zinnia's complex and sardonic feelings about dying and the note-perfect friendship between Zinnia and Charm.
"You know it wasn't originally a spinning wheel in the story?" I offer, because alcohol transforms me into a chatty Wikipedia page.
A Spindle Splintered is told from Zinnia's first-person perspective, and Zinnia is great. My favorite thing about Harrow's writing is the fierce and complex emotions of her characters. The overall tone is lighter than The Once and Future Witches or The Ten Thousand Doors of January, but Harrow doesn't shy away from showing the reader Zinnia's internal thought process about her illness (and her eye-rolling bemusement at some of the earlier emotional stages she went through).
Dying girl rule #3 is no romance, because my entire life is one long trolley problem and I don't want to put any more bodies on the tracks. (I've spent enough time in therapy to know that this isn't "a healthy attitude towards attachment," but I personally feel that accepting my own imminent mortality is enough work without also having a healthy attitude about it.)
There's a content warning for parents here, since Harrow spends some time on the reaction of Zinnia's parents and the complicated dance between hope, despair, smothering, and freedom that she and they had to go through. There were no easy answers and all balances were fragile, but Zinnia always finds her feet. For me, Harrow's character writing is like emotional martial arts: rolling with punches, taking falls, clear-eyed about the setbacks, but always finding a new point of stability to punch back at the world. Zinnia adds just enough teenage irreverence and impatience to blunt the hardest emotional hits. I really enjoy reading it. The one caution I will make about that part of the story is that the focus is on Zinnia's projected lifespan and not on her illness specifically. Harrow uses it as setup to dig into how she and her parents would react to that knowledge (and I thought those parts were done well), but it's told from the perspective of "what would you do if you knew your date of death," not from the perspective of someone living with a disability. It is to some extent disability as plot device, and like the fairy tale that it's based on, it's deeply invested in the "find a cure" approach to the problem. I'm not disabled and am not the person to ask about how well a story handles disability, but I suspect this one may leave something to be desired. I thought the opening of this story is great. Zinnia is a great first-person protagonist and the opening few chapters are overflowing with snark and acerbic commentary. Dumping Zinnia into another world but having text messaging still work is genius, and I kind of wish Harrow had made that even more central to the book. The rest of the story was good but not as good, and the ending was somewhat predictable and a bit of a deus ex machina. But the characters carried it throughout, and I will happily read more of this. Recommended, with the caveat about disability and the content warning for parents. Followed by A Mirror Mended, which I have already pre-ordered. Rating: 8 out of 10

24 December 2021

Vincent Bernat: Automatic login with startx and systemd

If your workstation is using full-disk encryption, you may want to jump directly to your desktop environment after entering the passphrase to decrypt the disk. Many display managers like GDM and LightDM have an autologin feature. However, only GDM can run Xorg with standard user privileges. Here is an alternative using startx and a systemd service:
[Unit]
Description=X11 session for bernat
After=graphical.target systemd-user-sessions.service
[Service]
User=bernat
WorkingDirectory=~
PAMName=login
Environment=XDG_SESSION_TYPE=x11
TTYPath=/dev/tty8
StandardInput=tty
UnsetEnvironment=TERM
UtmpIdentifier=tty8
UtmpMode=user
StandardOutput=journal
ExecStartPre=/usr/bin/chvt 8
ExecStart=/usr/bin/startx -- vt8 -keeptty -verbose 3 -logfile /dev/null
Restart=no
[Install]
WantedBy=graphical.target
Let me explain each block: Drop this unit in /etc/systemd/system/x11-autologin.service and enable it with systemctl enable x11-autologin.service. Xorg is now running rootless and logging into the journal. After using it for a few months, I didn t notice any regression compared to LightDM with autologin.

  1. For more information on how logind provides access to devices, see this blog post. The method names do not match the current implementation, but the concepts are still correct. Xorg takes control of the session when the TTY is active.
  2. Xorg could change the type of the session itself after taking control of it, but it does not.
  3. There is some code in Xorg to do that, but it is executed too late and fails with: xf86OpenConsole: VT_ACTIVATE failed: Operation not permitted.

15 December 2021

Dirk Eddelbuettel: nanotime 0.3.5 on CRAN: Update

Another (minor) nanotime release, now at version 0.3.5, just arrived at CRAN. It follows the updates RDieHarder 0.2.3 and RcppCCTZ 0.2.10 earlier today in bringing a patch kindly prepared by Tomas Kalibera for the upcoming (and very useful) UCRT changes for Windows involving small build changes for the updated Windows toolchain. nanotime relies on the RcppCCTZ package for (efficient) high(er) resolution time parsing and formatting up to nanosecond resolution, and the bit64 package for the actual integer64 arithmetic. Initially implemented using the S3 system, it has benefitted greatly from a rigorous refactoring by Leonardo who not only rejigged nanotime internals in S4 but also added new S4 types for periods, intervals and durations. The NEWS snippet adds more details.

Changes in version 0.3.5 (2021-12-14)
  • Applied patch by Tomas Kalibera for Windows UCRT under the upcoming R 4.2.0 expected for April.

Thanks to my CRANberries there is also a diff to the previous version. More details and examples are at the nanotime page; code, issue tickets etc at the GitHub repository. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Next.

Previous.